Data rate equalisation to account for relatively different printhead widths

ABSTRACT

A printer controller configured to generate dot data for supply to a printhead that includes at least first and second longitudinally extending printhead chips that are positioned adjacent each other either side of a join region such that a printing width of the printhead is wider than the length of either printhead chip, the printer controller being configured such that, in the event that the printhead chips to which dot data is being supplied are of sufficiently unequal relative length, the dot data is supplied more frequently, or at a higher rate, to the longer of the two printhead chips.

FIELD OF INVENTION

[0001] The present invention relates to techniques for providing data toa printhead such that the requirements of different length printheadmodules comprising the printhead are taken in to account.

[0002] The invention has primarily been developed for use with aprinthead comprising one or more printhead modules constructed usingmicroelectromechanical systems (MEMS) techniques, and will be describedwith reference to this application. However, it will be appreciated thatthe invention can be applied to other types of printing technologies inwhich analogous problems are faced.

BACKGROUND OF INVENTION

[0003] Manufacturing a printhead that has relatively high resolution andprint-speed raises a number of problems.

[0004] Difficulties in manufacturing pagewidth printheads of anysubstantial size arise due to the relatively small dimensions ofstandard silicon wafers that are used in printhead (or printhead module)manufacture. For example, if it is desired to make an 8 inch widepagewidth printhead, only one such printhead can be laid out on astandard 8-inch wafer, since such wafers are circular in plan.Manufacturing a pagewidth printhead from two or more smaller modules canreduce this limitation to some extent, but raises other problems relatedto providing a joint between adjacent printhead modules that is preciseenough to avoid visible artefacts (which would typically take the formof noticeable lines) when the printhead is used. The problem isexacerbated in relatively high-resolution applications because of thetight tolerances dictated by the small spacing between nozzles.

[0005] The quality of a joint region between adjacent printhead modulesrelies on factors including a precision with which the abutting ends ofeach module can be manufactured, the accuracy with which they can bealigned when assembled into a single printhead, and other more practicalfactors such as management of ink channels behind the nozzles. It willbe appreciated that the difficulties include relative verticaldisplacement of the printhead modules with respect to each other.

[0006] Whilst some of these issues may be dealt with by careful designand manufacture, the level of precision required renders it relativelyexpensive to manufacture printheads within the required tolerances. Itwould be desirable to provide a solution to one or more of the problemsassociated with precision manufacture and assembly of multiple printheadmodules to form a printhead, and especially a pagewidth printhead.

[0007] In some cases, it is desirable to produce a number of differentprinthead module types or lengths on a substrate to maximise usage ofthe substrate's surface area. However, different sizes and types ofmodules will have different numbers and layouts of print nozzles,potentially including different horizontal and vertical offsets. Wheretwo or more modules are to be joined to form a single printhead, thereis also the problem of dealing with different seam shapes betweenabutting ends of joined modules, which again may incorporate vertical orhorizontal offsets between the modules. Printhead controllers areusually dedicated application specific integrated circuits (ASICs)designed for specific use with a single type of printhead module, thatis used by itself rather than with other modules. It would be desirableto provide a way in which different lengths and types of printheadmodules could be accounted for using a single printer controller.

[0008] Printer controllers face other difficulties when two or moreprinthead modules are involved, especially if it is desired to send dotdata to each of the printheads directly (rather than via a singleprinthead connected to the controller). One concern is that datadelivered to different length controllers at the same rate will causethe shorter of the modules to be ready for printing before any longermodules. Where there is little difference involved, the issue may not beof importance, but for large length differences, the result is that thebandwidth of a shared memory from which the dot data is supplied to themodules is effectively left idle once one of the modules is full and theremaining module or modules is still being filled. It would be desirableto provide a way of improving memory bandwidth usage in a systemcomprising a plurality of printhead modules of uneven length.

[0009] In any printing system that includes multiple nozzles on aprinthead or printhead module, there is the possibility of one or moreof the nozzles failing in the field, or being inoperative due tomanufacturing defect. Given the relatively large size of a typicalprinthead module, it would be desirable to provide some form ofcompensation for one or more “dead” nozzles. Where the printhead alsooutputs fixative on a per-nozzle basis, it is also desirable that thefixative is provided in such a way that dead nozzles are compensatedfor.

[0010] A printer controller can take the form of an integrated circuit,comprising a processor and one or more peripheral hardware units forimplementing specific data manipulation functions. A number of theseunits and the processor may need access to a common resource such asmemory. One way of arbitrating between multiple access requests for acommon resource is timeslot arbitration, in which access to the resourceis guaranteed to a particular requestor during a predetermined timeslot.

[0011] One difficulty with this arrangement lies in the fact that notall access requests make the same demands on the resource in terms oftiming and latency. For example, a memory read requires that data befetched from memory, which may take a number of cycles, whereas a memorywrite can commence immediately. Timeslot arbitration does not take intoaccount these differences, which may result in accesses being performedin a less efficient manner than might otherwise be the case. It would bedesirable to provide a timeslot arbitration scheme that improved thisefficiency as compared with prior art timeslot arbitration schemes.

[0012] Also of concern when allocating resources in a timeslotarbitration scheme is the fact that the priority of an access requestmay not be the same for all units. For example, it would be desirable toprovide a timeslot arbitration scheme in which one requestor (typicallythe memory) is granted special priority such that its requests are dealtwith earlier than would be the case in the absence of such priority.

[0013] In systems that use a memory and cache, a cache miss (in which anattempt to load data or an instruction from a cache fails) results in amemory access followed by a cache update. It is often desirable whenupdating the cache in this way to update data other than that which wasactually missed. A typical example would be a cache miss for a byteresulting in an entire word or line of the cache associated with thatbyte being updated. However, this can have the effect of tying upbandwidth between the memory (or a memory manager) and the processorwhere the bandwidth is such that several cycles are required to transferthe entire word or line to the cache. It would be desirable to provide amechanism for updating a cache that improved cache update speed and/orefficiency.

[0014] Most integrated circuits an externally provided signal as (or togenerate) a clock, often provided from a dedicated clock generationcircuit. This is often due to the difficulties of providing an onboardclock that can operate at a speed that is predictable. Manufacturingtolerances of such on-board clock generation circuitry can result inclock rates that vary by a factor of two, and operating temperatures canincrease this margin by an additional factor of two. In some cases, theparticular rate at which the clock operates is not of particularconcern. However, where the integrated circuit will be writing to aninternal circuit that is sensitive to the time over which a signal isprovided, it may be undesirable to have the signal be applied for toolong or short a time. For example, flash memory is sensitive to beingwritten too for too long a period. It would be desirable to provide amechanism for adjusting a rate of an on-chip system clock to take intoaccount the impact of manufacturing variations on clockspeed.

[0015] One form of attacking a secure chip is to induce (usually byincreasing) a clock speed that takes the logic outside its ratedoperating frequency. One way of doing this is to reduce the temperatureof the integrated circuit, which can cause the clock to race. Above acertain frequency, some logic will start malfunctioning. In some cases,the malfunction can be such that information on the chip that wouldotherwise be secure may become available to an external connection. Itwould be desirable to protect an integrated circuit from such attacks.

[0016] In an integrated circuit comprising non-volatile memory, a powerfailure can result in unintentional behaviour. For example, if anaddress or data becomes unreliable due to falling voltage supplied tothe circuit but there is still sufficient power to cause a write,incorrect data can be written. Even worse, the data (incorrect or not)could be written to the wrong memory. The problem is exacerbated withmulti-word writes. It would be desirable to provide a mechanism forreducing or preventing spurious writes when power to an integratedcircuit is failing.

[0017] In an integrated circuit, it is often desirable to reduceunauthorised access to the contents of memory. This is particularly thecase where the memory includes a key or some other form of securityinformation that allows the integrated circuit to communicate withanother entity (such as another integrated circuit, for example) in asecure manner. It would be particularly advantageous to prevent attacksinvolving direct probing of memory addresses by physically investigatingthe chip (as distinct from electronic or logical attacks viamanipulation of signals and power supplied to the integrated circuit).

[0018] It is also desirable to provide an environment where themanufacturer of the integrated circuit (or some other authorised entity)can verify or authorize code to be run on an integrated circuit.

[0019] Another desideratum would be the ability of two or more entities,such as integrated circuits, to communicate with each other in a securemanner. It would also be desirable to provide a mechanism for securecommunication between a first entity and a second entity, where the twoentities, whilst capable of some form of secure communication, are notable to establish such communication between themselves.

[0020] In a system that uses resources (such as a printer, which usesinks) it may be desirable to monitor and update a record related toresource usage. Authenticating ink quality can be a major issue, sincethe attributes of inks used by a given printhead can be quite specific.Use of incorrect ink can result in anything from misfiring or poorperformance to damage or destruction of the printhead. It wouldtherefore be desirable to provide a system that enables authenticationof the correct ink being used, as well as providing various supportsystems secure enabling refilling of ink cartridges.

[0021] In a system that prevents unauthorized programs from being loadedonto or run on an integrated circuit, it can be laborious to allowdevelopers of software to access the circuits during softwaredevelopment. Enabling access to integrated circuits of a particular typerequires authenticating software with a relatively high-level key.Distributing the key for use by developers is inherently unsafe, since asingle leak of the key outside the organization could endanger securityof all chips that use a related key to authorize programs. Having asmall number of people with high-security clearance available toauthenticate programs for testing can be inconvenient, particularly inthe case where frequent incremental changes in programs duringdevelopment require testing. It would be desirable to provide amechanism for allowing access to one or more integrated circuits withoutrisking the security of other integrated circuits in a series of suchintegrated circuits.

[0022] In symmetric key security, a message, denoted by M, is plaintext.The process of transforming M into ciphertext C, where the substance ofM is hidden, is called encryption. The process of transforming C backinto M is called decryption. Referring to the encryption function as E,and the decryption function as D, we have the following identities:

E[M]=C

D[C]=M

[0023] Therefore the following identity is true:

D[E[M]]=M

[0024] A symmetric encryption algorithm is one where:

[0025] the encryption function E relies on key K₁,

[0026] the decryption function D relies on key K₂,

[0027] K₂ can be derived from K₁, and

[0028] K₁ can be derived from K₂.

[0029] In most symmetric algorithms, K₁ equals K₂. However, even if K₁does not equal K₂, given that one key can be derived from the other, asingle key K can suffice for the mathematical definition. Thus:

E_(K)[M]=C

D_(K)[C]=M

[0030] The security of these algorithms rests very much in the key K.Knowledge of K allows anyone to encrypt or decrypt. Consequently K mustremain a secret for the duration of the value of M. For example, M maybe a wartime message “My current position is grid position 123-456”.Once the war is over the value of M is greatly reduced, and if K is madepublic, the knowledge of the combat unit's position may be of norelevance whatsoever. The security of the particular symmetric algorithmis a function of two things: the strength of the algorithm and thelength of the key.

[0031] An asymmetric encryption algorithm is one where:

[0032] the encryption function E relies on key K₁,

[0033] the decryption function D relies on key K₂,

[0034] K₂ cannot be derived from K₁ in a reasonable amount of time, and

[0035] K₁ cannot be derived from K₂ in a reasonable amount of time.

[0036] Thus:

E_(K1)[M]=C

D_(K2)[C]=M

[0037] These algorithms are also called public-key because one key K₁can be made public. Thus anyone can encrypt a message (using K₁) butonly the person with the corresponding decryption key (K₂) can decryptand thus read the message.

[0038] In most cases, the following identity also holds:

E_(K2)[M]=C

D_(K1)[C]=M

[0039] This identity is very important because it implies that anyonewith the public key K₁ can see M and know that it came from the owner ofK₂. No-one else could have generated C because to do so would implyknowledge of K₂. This gives rise to a different application, unrelatedto encryption - digital signatures.

[0040] A number of public key cryptographic algorithms exist. Most areimpractical to implement, and many generate a very large C for a given Mor require enormous keys. Still others, while secure, are far too slowto be practical for several years. Because of this, many public keysystems are hybrid - a public key mechanism is used to transmit asymmetric session key, and then the session key is used for the actualmessages.

[0041] All of the algorithms have a problem in terms of key selection. Arandom number is simply not secure enough. The two large primes p and qmust be chosen carefully—there are certain weak combinations that can befactored more easily (some of the weak keys can be tested for). Butnonetheless, key selection is not a simple matter of randomly selecting1024 bits for example. Consequently the key selection process must alsobe secure.

[0042] Symmetric and asymmetric schemes both suffer from a difficulty inallowing establishment of multiple relationships between one entity anda two or more others, without the need to provide multiple sets of keys.For example, if a main entity wants to establish secure communicationswith two or more additional entities, it will need to maintain adifferent key for each of the additional entities. For practicalreasons, it is desirable to avoid generating and storing large numbersof keys. To reduce key numbers, two or more of the entities may use thesame key to communicate with the main entity. However, this means thatthe main entity cannot be sure which of the entities it is communicatingwith. Similarly, messages from the main entity to one of the entitiescan be decrypted by any of the other entities with the same key. Itwould be desirable if a mechanism could be provided to allow securecommunication between a main entity and one or more other entities thatovercomes at least some of the shortcomings of prior art.

[0043] In a system where a first entity is capable of securecommunication of some form, it may be desirable to establish arelationship with another entity without providing the other entity withany information related the first entity's security features. Typically,the security features might include a key or a cryptographic function.It would be desirable to provide a mechanism for enabling securecommunications between a first and second entity when they do not sharethe requisite secret function, key or other relationship to enable themto establish trust.

[0044] A number of other aspects, features, preferences and embodimentsare disclosed in the Detailed Description of the Preferred Embodimentbelow.

SUMMARY OF THE INVENTION

[0045] In accordance with the invention, there is provided a printercontroller configured to generate dot data for supply to a printheadthat includes at least first and second longitudinally extendingprinthead chips that are positioned adjacent each other either side of ajoin region such that a printing width of the printhead is wider thanthe length of either printhead chip, the printer controller beingconfigured such that, in the event that the printhead chips to which dotdata is being supplied are of sufficiently unequal relative length, thedot data is supplied more frequently, or at a higher rate, to the longerof the two printhead chips.

[0046] Preferably, the printer controller is configured to supply thedot data to the printhead modules such that none of the printheadmodules is full and ready for printing substantially earlier than any ofthe other printhead modules.

[0047] Preferably, the dot data is supplied to the printhead from amemory under the control of the printhead controller.

[0048] It is particularly preferred that the printer controller includea hardware module for undertaking the task of bandwidth management. Morepreferably, the hardware module is also configured to compensate fordifferent length printheads.

[0049] It is particularly preferred that the printer controller beconfigured to manipulate the supply of dot data to each of the printheadmodules such that memory bandwidth usage is substantially constantduring a printhead loading cycle.

BRIEF DESCRIPTION OF THE DRAWINGS

[0050] Preferred and other embodiments of the invention will now bedescribed, by way of example only, with reference to the accompanyingdrawings, in which:

[0051]FIG. 1 is an example of state machine notation

[0052]FIG. 2 shows document data flow in a printer

[0053]FIG. 3 is an example of a single printer controller (hereinafter“SoPEC”) A4 simplex printer system

[0054]FIG. 4 is an example of a dual SoPEC A4 duplex printer system

[0055]FIG. 5 is an example of a dual SoPEC A3 simplex printer system

[0056]FIG. 6 is an example of a quad SoPEC A3 duplex printer system

[0057]FIG. 7 is an example of a SoPEC A4 simplex printing system with anextra SoPEC used as DRAM storage

[0058]FIG. 8 is an example of an A3 duplex printing system featuringfour printing SoPECs

[0059]FIG. 9 shows pages containing different numbers of bands

[0060]FIG. 10 shows the contents of a page band

[0061]FIG. 11 illustrates a page data path from host to SoPEC

[0062]FIG. 12 shows a page structure

[0063]FIG. 13 shows a SoPEC system top level partition

[0064]FIG. 14 shows a SoPEC CPU memory map (not to scale)

[0065]FIG. 15 is a block diagram of CPU

[0066]FIG. 16 shows CPU bus transactions

[0067]FIG. 17 shows a state machine for a CPU subsystem slave

[0068]FIG. 18 shows a SoPEC CPU memory map (not to scale)

[0069]FIG. 19 shows an external signal view of a memory management unit(hereinafter “MMU”) sub-block partition

[0070]FIG. 20 shows an internal signal view of an MMU sub-blockpartition

[0071]FIG. 21 shows a DRAM write buffer

[0072]FIG. 22 shows DIU waveforms for multiple transactions

[0073]FIG. 23 shows a SoPEC LEON CPU core

[0074]FIG. 24 shows a cache data RAM wrapper

[0075]FIG. 25 shows a realtime debug unit block diagram

[0076]FIG. 26 shows interrupt acknowledge cycles for single and pendinginterrupts

[0077]FIG. 27 shows an A3 duplex system featuring four printing SoPECswith a single SoPEC DRAM device

[0078]FIG. 28 is an SCB block diagram

[0079]FIG. 29 is a logical view of the SCB of FIG. 28

[0080]FIG. 30 shows an ISI configuration with four SoPEC devices

[0081]FIG. 31 shows half-duplex interleaved transmission from ISIMasterto ISISlave

[0082]FIG. 32 shows ISI transactions

[0083]FIG. 33 shows an ISI long packet

[0084]FIG. 34 shows an ISI ping packet

[0085]FIG. 35 shows a short ISI packet

[0086]FIG. 36 shows successful transmission of two long packets withsequence bit toggling

[0087]FIG. 37 shows sequence bit operation with errored long packet

[0088]FIG. 38 shows sequence bit operation with ACK error

[0089]FIG. 39 shows an ISI sub-block partition

[0090]FIG. 40 shows an ISI serial interface engine functional blockdiagram

[0091]FIG. 41 is an SIE edge detection and data 10 diagram

[0092]FIG. 42 is an SIE Rx/Tx state machine Tx cycle state diagram

[0093]FIG. 43 shows an SIE Rx/Tx state machine Tx bit stuff ‘0’ cyclestate diagram

[0094]FIG. 44 shows an SIE Rx/Tx state machine Tx bit stuff ‘1’ cyclestate diagram

[0095]FIG. 45 shows an SIE Rx/Tx state machine Rx cycle state diagram

[0096]FIG. 46 shows an SIE Tx functional timing example

[0097]FIG. 47 shows an SIE Rx functional timing example

[0098]FIG. 48 shows an SIE Rx/Tx FIFO block diagram

[0099]FIG. 49 shows SIE Rx/Tx FIFO control signal gating

[0100]FIG. 50 shows an SIE bit stuffing state machine Tx cycle statediagram

[0101]FIG. 51 shows an SIE bit stripping state machine Rx cycle statediagram

[0102]FIG. 52 shows a CRC16 generation/checking shift register

[0103]FIG. 53 shows circular buffer operation

[0104]FIG. 54 shows duty cycle select

[0105]FIG. 55 shows a GPIO partition

[0106]FIG. 56 shows a motor control RTL diagram

[0107]FIG. 57 is an input de-glitch RTL diagram

[0108]FIG. 58 is a frequency analyser RTL diagram

[0109]FIG. 59 shows a brushless DC controller

[0110]FIG. 60 shows a period measure unit

[0111]FIG. 61 shows line synch generation logic

[0112]FIG. 62 shows an ICU partition

[0113]FIG. 63 is an interrupt clear state diagram

[0114]FIG. 64 is a watchdog timer RTL diagram

[0115]FIG. 65 is a generic timer RTL diagram

[0116]FIG. 67 is a Pulse generator RTL diagram

[0117]FIG. 68 shows a SoPEC clock relationship

[0118]FIG. 69 shows a CPR block partition

[0119]FIG. 70 shows reset deglitch logic

[0120]FIG. 71 shows reset synchronizer logic

[0121]FIG. 72 is a clock gate logic diagram

[0122]FIG. 73 shows a PLL and Clock divider logic

[0123]FIG. 74 shows a PLL control state machine diagram

[0124]FIG. 75 shows a LSS master system-level interface

[0125]FIG. 76 shows STAR_(T) and STOP conditions

[0126]FIG. 77 shows an LSS transfer of 2 data bytes

[0127]FIG. 78 is an example of an LSS write to a QA Chip

[0128]FIG. 79 is an example of an LSS read from QA Chip

[0129]FIG. 80 shows an LSS block diagram

[0130]FIG. 81 shows an LSS multi-command transaction

[0131]FIG. 82 shows start and stop generation based on previous busstate

[0132]FIG. 83 shows an LSS master state machine

[0133]FIG. 84 shows LSS master timing

[0134]FIG. 85 shows a SoPEC system top level partition

[0135]FIG. 86 shows an ead bus with 3 cycle random DRAM read accesses

[0136]FIG. 87 shows interleaving of CPU and non-CPU read accesses

[0137]FIG. 88 shows interleaving of read and write accesses with 3 cyclerandom DRAM accesses

[0138]FIG. 89 shows interleaving of write accesses with 3 cycle randomDRAM accesses

[0139]FIG. 90 shows a read protocol for a SoPEC Unit making a single256-bit access

[0140]FIG. 91 shows a read protocol for a SoPEC Unit making a single256-bit access

[0141]FIG. 92 shows a write protocol for a SoPEC Unit making a single256-bit access

[0142]FIG. 93 shows a protocol for a posted, masked, 128-bit write bythe CPU

[0143]FIG. 94 shows a write protocol shown for CDU making fourcontiguous 64-bit accesses

[0144]FIG. 95 shows timeslot-based arbitration

[0145]FIG. 96 shows timeslot-based arbitration with separate pointers

[0146]FIG. 97 shows a first example (a) of separate read and writearbitration

[0147]FIG. 98 shows a second example (b) of separate read and writearbitration

[0148]FIG. 99 shows a third example (c) of separate read and writearbitration

[0149]FIG. 100 shows a DIU partition

[0150]FIG. 101 shows a DIU partition

[0151]FIG. 102 shows multiplexing and address translation logic for twomemory instances

[0152]FIG. 103 shows a timing of dau_dcu_valID, dcu_dau_adv anddcu_dau_wadv

[0153]FIG. 104 shows a DCU state machine

[0154]FIG. 105 shows random read timing

[0155]FIG. 106 shows random write timing

[0156]FIG. 107 shows refresh timing

[0157]FIG. 108 shows page mode write timing

[0158]FIG. 109 shows timing of non-CPU DIU read access

[0159]FIG. 110 shows timing of CPU DIU read access

[0160]FIG. 111 shows a CPU DIU read access

[0161]FIG. 112 shows timing of CPU DIU write access

[0162]FIG. 113 shows timing of a non-CDU/non-CPU DIU write access

[0163]FIG. 114 shows timing of CDU DIU write access

[0164]FIG. 115 shows command multiplexor sub-block partition

[0165]FIG. 116 shows command multiplexor timing at DIU requestersinterface

[0166]FIG. 117 shows generation of re_arbitrate and re_arbitrate_wadv

[0167]FIG. 118 shows CPU interface and arbitration logic

[0168]FIG. 119 shows arbitration timing

[0169]FIG. 120 shows setting RotationSync to enable a new rotation.

[0170]FIG. 121 shows a timeslot based arbitration

[0171]FIG. 122 shows a timeslot based arbitration with separate pointers

[0172]FIG. 123 shows a CPU pre-access write lookahead pointer

[0173]FIG. 124 shows arbitration hierarchy

[0174]FIG. 125 shows hierarchical round-robin priority comparison

[0175]FIG. 126 shows a read multiplexor partition

[0176]FIG. 127 shows a read command queue (4 deep buffer)

[0177]FIG. 128 shows state-machines for shared read bus accesses

[0178]FIG. 129 shows a write multiplexor partition

[0179]FIG. 130 shows a read multiplexer timing for back-to-back sharedread bus transfer

[0180]FIG. 131 shows a write multiplexer partition

[0181]FIG. 132 shows a block diagram of a PCU

[0182]FIG. 133 shows PCU accesses to PEP registers

[0183]FIG. 134 shows command arbitration and execution

[0184]FIG. 135 shows DRAM command access state machine

[0185]FIG. 136 shows an outline of contone data flow with respect to CDU

[0186]FIG. 137 shows a DRAM storage arrangement for a single line ofJPEG 8×8 blocks in 4 colors

[0187]FIG. 138 shows a read control unit state machine

[0188]FIG. 139 shows a memory arrangement of JPEG blocks

[0189]FIG. 140 shows a contone data write state machine

[0190]FIG. 141 shows lead-in and lead-out clipping of contone data inmulti-SoPEC environment

[0191]FIG. 142 shows a block diagram of CFU

[0192]FIG. 143 shows a DRAM storage arrangement for a single line ofJPEG blocks in 4 colors

[0193]FIG. 144 shows a block diagram of color space converter

[0194]FIG. 145 shows a converter/invertor

[0195]FIG. 146 shows a high-level block diagram of LBD in context

[0196]FIG. 147 shows a schematic outline of the LBD and the SFU

[0197]FIG. 148 shows a block diagram of lossless bi-level decoder

[0198]FIG. 149 shows a stream decoder block diagram

[0199]FIG. 150 shows a command controller block diagram

[0200]FIG. 151 shows a state diagram for command controller (CC) statemachine

[0201]FIG. 152 shows a next edge unit block diagram

[0202]FIG. 153 shows a next edge unit buffer diagram

[0203]FIG. 154 shows a next edge unit edge detect diagram

[0204]FIG. 155 shows a state diagram for the next edge unit statemachine

[0205]FIG. 156 shows a line fill unit block diagram

[0206]FIG. 157 shows a state diagram for the Line Fill Unit (LFU) statemachine

[0207]FIG. 158 shows a bi-level DRAM buffer

[0208]FIG. 159 shows interfaces between LBD/SFU/HCU

[0209]FIG. 160 shows an SFU sub-block partition

[0210]FIG. 161 shows an LBDPrevLineFifo sub-block

[0211]FIG. 162 shows timing of signals on the LBDPrevLineFIFO interfaceto DIU and address generator

[0212]FIG. 163 shows timing of signals on LBDPrevLineFIFO interface toDIU and address generator

[0213]FIG. 164 shows LBDNextLineFifo sub-block

[0214]FIG. 165 shows timing of signals on LBDNextLineFIFO interface toDIU and address generator

[0215]FIG. 166 shows LBDNextLineFIFO DIU interface state diagram

[0216]FIG. 167 shows an LDB to SFU write interface

[0217]FIG. 168 shows an LDB to SFU read interface (within a line)

[0218]FIG. 169 shows an HCUReadLineFifo Sub-block

[0219]FIG. 170 shows a DIU write Interface

[0220]FIG. 171 shows a DIU Read Interface multiplexing by select_hrfplf

[0221]FIG. 172 shows DIU read request arbitration logic

[0222]FIG. 173 shows address generation

[0223]FIG. 174 shows an X scaling control unit

[0224]FIG. 175 Y shows a scaling control unit

[0225]FIG. 176 shows an overview of X and Y scaling at HCU interface

[0226]FIG. 177 shows a high level block diagram of TE in context

[0227]FIG. 178 shows a QR Code

[0228]FIG. 179 shows Netpage tag structure

[0229]FIG. 180 shows a Netpage tag with data rendered at 1600 dpi(magnified view)

[0230]FIG. 181 shows an example of 2×2 dots for each block of QR code

[0231]FIG. 182 shows placement of tags for portrait & landscape printing

[0232]FIG. 183 shows agGeneral representation of tag placement

[0233]FIG. 184 shows composition of SoPEC's tag format structure

[0234]FIG. 185 shows a simple 3×3 tag structure

[0235]FIG. 186 shows 3×3 tag redesigned for 21×21 area (not simplereplication)

[0236]FIG. 187 shows a TE Block Diagram

[0237]FIG. 188 shows a TE Hierarchy

[0238]FIG. 189 shows a block diagram of PCU accesses

[0239]FIG. 190 shows a tag encoder top-level FSM

[0240]FIG. 191 shows generated control signals

[0241]FIG. 192 shows logic to combine dot information and encoded data

[0242]FIG. 193 shows generation of Lastdotintag/1

[0243]FIG. 194 shows generation of Dot Position Valid

[0244]FIG. 195 shows generation of write enable to the TFU

[0245]FIG. 196 shows generation of Tag Dot Number

[0246]FIG. 197 shows TDI Architecture

[0247]FIG. 198 shows data flow through the TDI

[0248]FIG. 199 shows raw tag data interface block diagram

[0249]FIG. 200 shows an RTDI State Flow Diagram

[0250]FIG. 201 shows a relationship between TE_endoftagdata,cdu_startofbandstore and cdu_endofbandstore

[0251]FIG. 202 shows a TDi State Flow Diagram

[0252]FIG. 203 shows mapping of the tag data to codewords 0-7

[0253]FIG. 204 shows coding and mapping of uncoded fixed tag data for(15,5) RS encoder

[0254]FIG. 205 shows mapping of pre-coded fixed tag data

[0255]FIG. 206 shows coding and mapping of variable tag data for (15,7)RS encoder

[0256]FIG. 207 shows coding and mapping of uncoded fixed tag data for(15,7) RS encoder

[0257]FIG. 208 shows mapping of 2D decoded variable tag data

[0258]FIG. 209 shows a simple block diagram for an m=4 Reed Solomonencoder

[0259]FIG. 210 shows an RS encoder I/O diagram

[0260]FIG. 211 shows a (15,5) & (15,7) RS encoder block diagram

[0261]FIG. 212 shows a (15,5) RS encoder timing diagram

[0262]FIG. 213 shows a (15,7) RS encoder timing diagram

[0263]FIG. 214 shows a circuit for multiplying by alpha³

[0264]FIG. 215 shows adding two field elements

[0265]FIG. 216 shows an RS encoder implementation

[0266]FIG. 217 shows an encoded tag data interface

[0267]FIG. 218 shows an encoded fixed tag data interface

[0268]FIG. 219 shows an encoded variable tag data interface

[0269]FIG. 220 shows an encoded variable tag data sub-buffer

[0270]FIG. 221 shows a breakdown of the tag format structure

[0271]FIG. 222 shows a TFSI FSM state flow diagram

[0272]FIG. 223 shows a TFS block diagram

[0273]FIG. 224 shows a table A interface block diagram

[0274]FIG. 225 shows a table A address generator

[0275]FIG. 226 shows a table C interface block diagram

[0276]FIG. 227 shows a table B interface block diagram

[0277]FIG. 228 shows interfaces between TE, TFU and HCU

[0278]FIG. 229 shows a 16-byte FIFO in TFU

[0279]FIG. 230 shows a high level block diagram showing the HCU and itsexternal interfaces

[0280]FIG. 231 shows a block diagram of the HCU

[0281]FIG. 232 shows a block diagram of the control unit

[0282]FIG. 233 shows a block diagram of determine advdot unit

[0283]FIG. 234 shows a page structure

[0284]FIG. 235 shows a block diagram of a margin unit

[0285]FIG. 236 shows a block diagram of a dither matrix table interface

[0286]FIG. 237 shows an example of reading lines of dither matrix fromDRAM

[0287]FIG. 238 shows a state machine to read dither matrix table

[0288]FIG. 239 shows a contone dotgen unit

[0289]FIG. 240 shows a block diagram of dot reorg unit

[0290]FIG. 241 shows an HCU to DNC interface (also used in DNC to DWU,LLU to PHI)

[0291]FIG. 242 shows SFU to HCU interface (all feeders to HCU)

[0292]FIG. 243 shows representative logic of the SFU to HCU interface

[0293]FIG. 244 shows a high-level block diagram of DNC

[0294]FIG. 245 shows a dead nozzle table format

[0295]FIG. 246 shows set of dots operated on for error diffusion

[0296]FIG. 247 shows a block diagram of DNC

[0297]FIG. 248 shows a sub-block diagram of ink replacement unit

[0298]FIG. 249 shows a dead nozzle table state machine

[0299]FIG. 250 shows logic for dead nozzle removal and ink replacement

[0300]FIG. 251 shows a sub-block diagram of error diffusion unit

[0301]FIG. 252 shows a maximum length 32-bit LFSR used for random bitgeneration

[0302]FIG. 253 shows a high-level data flow diagram of DWU in context

[0303]FIG. 254 shows a printhead nozzle layout for 36-nozzle bi-lithicprinthead

[0304]FIG. 255 shows a printhead nozzle layout for a 36-nozzle bi-lithicprinthead

[0305]FIG. 256 shows a dot line store logical representation

[0306]FIG. 257 shows a conceptual view of printhead row alignment

[0307]FIG. 258 shows a conceptual view of printhead rows (as seen by theLLU and PHI)

[0308]FIG. 259 shows a comparison of 1.5×v 2× buffering

[0309]FIG. 260 shows an even dot order in DRAM (increasing sense, 13320dot wide line)

[0310]FIG. 261 shows an even dot order in DRAM (decreasing sense, 13320dot wide line)

[0311]FIG. 262 shows a dotline FIFO data structure in DRAM

[0312]FIG. 263 shows a DWU partition

[0313]FIG. 264 shows a buffer address generator sub-block

[0314]FIG. 265 shows a DIU Interface sub-block

[0315]FIG. 266 shows an interface controller state diagram

[0316]FIG. 267 shows a high level data flow diagram of LLU in context

[0317]FIG. 268 shows paper and printhead nozzles relationship (examplewith D₁=D₂=5)

[0318]FIG. 269 shows printhead structure and dot generate order

[0319]FIG. 270 shows an order of dot data generation and transmission

[0320]FIG. 271 shows a conceptual view of printhead rows

[0321]FIG. 272 shows a dotline FIFO data structure in DRAM (LLUspecification)

[0322]FIG. 273 shows an LLU partition

[0323]FIG. 274 shows a dot generator RTL diagram

[0324]FIG. 275 shows a DIU interface

[0325]FIG. 276 shows an interface controller state diagram

[0326]FIG. 277 shows high-level data flow diagram of PHI in context

[0327]FIG. 278 is intentionally omitted

[0328]FIG. 279 shows printhead data rate equalization

[0329]FIG. 280 shows a printhead structure and dot generate order

[0330]FIG. 281 shows an order of dot data generation and transmission

[0331]FIG. 282 shows an order of dot data generation and transmission(single printhead case)

[0332]FIG. 283 shows printhead interface timing parameters

[0333]FIG. 284 shows printhead timing with margining

[0334]FIG. 285 shows a PHI block partition

[0335]FIG. 286 shows a sync generator state diagram

[0336]FIG. 287 shows a line sync de-glitch RTL diagram

[0337]FIG. 288 shows a fire generator state diagram

[0338]FIG. 289 shows a PHI controller state machine

[0339]FIG. 290 shows a datapath unit partition

[0340]FIG. 291 shows a dot order controller state diagram

[0341]FIG. 292 shows a data generator state diagram

[0342]FIG. 293 shows data serializer timing

[0343]FIG. 294 shows a data serializer RTL Diagram

[0344]FIG. 295 shows printhead types 0 to 7

[0345]FIG. 296 shows an ideal join between two dilithic printheadsegments

[0346]FIG. 297 shows an example of a join between two bilithic printheadsegments

[0347]FIG. 298 shows printable vs non-printable area under newdefinition (looking at colors as if 1 row only)

[0348]FIG. 299 shows identification of printhead nozzles andshift-register sequences for printheads in arrangement 1

[0349]FIG. 300 shows demultiplexing of data within the printheads inarrangement 1

[0350]FIG. 301 shows double data rate signalling for a type 0 printheadin arrangement 1

[0351]FIG. 302 shows double data rate signalling for a type 1 printheadin arrangement 1

[0352]FIG. 303 shows identification of printheads nozzles andshift-register sequences for printheads in arrangement 2

[0353]FIG. 304 shows demultiplexing of data within the printheads inarrangement 2

[0354]FIG. 305 shows double data rate signalling for a type 0 printheadin arrangement 2

[0355]FIG. 306 shows double data rate signalling for a type 1 printheadin arrangement 2

[0356]FIG. 307 shows all 8 printhead arrangements

[0357]FIG. 308 shows a printhead structure

[0358]FIG. 309 shows a column Structure

[0359]FIG. 310 shows a printhead dot shift register dot mapping to page

[0360]FIG. 311 shows data timing during printing

[0361]FIG. 312 shows print quality

[0362]FIG. 313 shows fire and select shift register setup for printing

[0363]FIG. 314 shows a fire pattern across butt end of printhead chips

[0364]FIG. 315 shows fire pattern generation

[0365]FIG. 316 shows determination of select shift register value

[0366]FIG. 317 shows timing for printing signals

[0367]FIG. 318 shows initialisation of printheads

[0368]FIG. 319 shows a nozzle test latching circuit

[0369]FIG. 320 shows nozzle testing

[0370]FIG. 321 shows a temperature reading

[0371]FIG. 322 shows CMOS testing

[0372]FIG. 323 shows a reticle layout

[0373]FIG. 324 shows a stepper pattern on Wafer

[0374]FIG. 325 shows relationship between datasets

[0375]FIG. 326 shows a validation hierarchy

[0376]FIG. 327 shows development of operating system code

[0377]FIG. 328 shows protocol for directly verifying reads from ChipR

[0378]FIG. 329 shows a protocol for signature translation protocol

[0379]FIG. 330 shows a protocol for a direct authenticated write

[0380]FIG. 331 shows an alternative protocol for a direct authenticatedwrite

[0381]FIG. 332 shows a protocol for basic update of permissions

[0382]FIG. 333 shows a protocol for a multiple-key update

[0383]FIG. 334 shows a protocol for a single-key authenticated read

[0384]FIG. 335 shows a protocol for a single-key authenticated write

[0385]FIG. 336 shows a protocol for a single-key update of permissions

[0386]FIG. 337 shows a protocol for a single-key update

[0387]FIG. 338 shows a protocol for a multiple-key single-Mauthenticated read

[0388]FIG. 339 shows a protocol for a multiple-key authenticated write

[0389]FIG. 340 shows a protocol for a multiple-key update of permissions

[0390]FIG. 341 shows a protocol for a multiple-key update

[0391]FIG. 342 shows a protocol for a multiple-key multiple-Mauthenticated read

[0392]FIG. 343 shows a protocol for a multiple-key authenticated write

[0393]FIG. 344 shows a protocol for a multiple-key update of permissions

[0394]FIG. 345 shows a protocol for a multiple-key update

[0395]FIG. 346 shows relationship of permissions bits to M[n] accessbits

[0396]FIG. 347 shows 160-bit maximal period LFSR

[0397]FIG. 348 shows clock filter

[0398]FIG. 349 shows tamper detection line

[0399]FIG. 350 shows an oversize nMOS transistor layout of TamperDetection Line

[0400]FIG. 351 shows a Tamper Detection Line

[0401]FIG. 352 shows how Tamper Detection Lines cover the NoiseGenerator

[0402]FIG. 353 shows a prior art FET Implementation of CMOS inverter

[0403]FIG. 354 shows non-flashing CMOS

[0404]FIG. 355 shows components of a printer-based refill device

[0405]FIG. 356 shows refilling of printers by printer-based refilldevice

[0406]FIG. 357 shows components of a home refill station

[0407]FIG. 358 shows a three-ink reservoir unit

[0408]FIG. 359 shows refill of ink cartridges in a home refill station

[0409]FIG. 360 shows components of a commercial refill station

[0410]FIG. 361 shows an ink reservoir unit

[0411]FIG. 362 shows refill of ink cartridges in a commercial refillstation (showing a single refill unit)

[0412]FIG. 363 shows equivalent signature generation

[0413]FIG. 364 shows a basic field definition

[0414]FIG. 365 shows an example of defining field sizes and positions

[0415]FIG. 366 shows permissions

[0416]FIG. 367 shows a first example of permissions for a field

[0417]FIG. 368 shows a second example of permissions for a field

[0418]FIG. 369 shows field attributes

[0419]FIG. 370 shows an output signature generation data format for Read

[0420]FIG. 371 shows an input signature verification data format forTest

[0421]FIG. 372 shows an output signature generation data format forTranslate

[0422]FIG. 373 shows an input signature verification data format forWriteAuth

[0423]FIG. 374 shows input signature data format for ReplaceKey

[0424]FIG. 375 shows a key replacement map

[0425]FIG. 376 shows a key replacement map after K₁ is replaced

[0426]FIG. 377 shows a key replacement process

[0427]FIG. 378 shows an output signature data format for GetProgramKey

[0428]FIG. 379 shows transfer and rollback process

[0429]FIG. 380 shows an upgrade flow

[0430]FIG. 381 shows authorised ink refill paths in the printing system

[0431]FIG. 382 shows an input signature verification data format forXferAmount

[0432]FIG. 383 shows a transfer and rollback process

[0433]FIG. 384 shows an upgrade flow

[0434]FIG. 385 shows authorised upgrade paths in the printing system

[0435]FIG. 386 shows a direct signature validation sequence

[0436]FIG. 387 shows signature validation using translation

[0437]FIG. 388 shows setup of preauth field attributes

[0438]FIG. 388A shows setup for multiple preauth fields

[0439]FIG. 389 shows a high level block diagram of QA Chip

[0440]FIG. 390 shows an analogue unit

[0441]FIG. 391 shows a serial bus protocol for trimming

[0442]FIG. 392 shows a block diagram of a trim unit

[0443]FIG. 393 shows a block diagram of a CPU of the QA chip

[0444]FIG. 394 shows block diagram of an MIU

[0445]FIG. 395 shows a block diagram of memory components

[0446]FIG. 396 shows a first byte sent to an IOU

[0447]FIG. 397 shows a block diagram of the IOU

[0448]FIG. 398 shows a relationship between external SDa and SClk andgeneration of internal signals

[0449]FIG. 399 shows block diagram of ALU

[0450]FIG. 400 shows a block diagram of DataSel

[0451]FIG. 401 shows a block diagram of ROR

[0452]FIG. 402 shows a block diagram of the ALU's IO block

[0453]FIG. 403 shows a block diagram of PCU

[0454]FIG. 404 shows a block diagram of an Address Generator Unit

[0455]FIG. 405 shows a block diagram for a Counter Unit

[0456]FIG. 406 shows a block diagram of PMU

[0457]FIG. 407 shows a state machine for PMU

[0458]FIG. 408 shows a block diagram of MRU

[0459]FIG. 409 shows simplified MAU state machine

[0460]FIG. 410 shows power-on reset behaviour

[0461]FIG. 411 shows a ring oscillator block diagram

[0462]FIG. 412 shows a system clock duty cycle

[0463]FIG. 413 shows power-on reset

DETAILED DESCRIPTION OF PREFERRED AND OTHER EMBODIMENTS

[0464] It will be appreciated that the detailed description that followstakes the form of a highly detailed design of the invention, includingsupporting hardware and software. A high level of detailed disclosure isprovided to ensure that one skilled in the art will have ample guidancefor implementing the invention.

[0465] Imperative phrases such as “must”, “requires”, “necessary” and“important” (and similar language) should be read as being indicative ofbeing necessary only for the preferred embodiment actually beingdescribed. As such, unless the opposite is clear from the context,imperative wording should not be interpreted as such. Nothing in thedetailed description is to be understood as limiting the scope of theinvention, which is intended to be defined as widely as is defined inthe accompanying claims.

[0466] Indications of expected rates, frequencies, costs, and otherquantitative values are exemplary and estimated only, and are made ingood faith. Nothing in this specification should be read as implyingthat a particular commercial embodiment is or will be capable of aparticular performance level in any measurable area.

[0467] It will be appreciated that the principles, methods and hardwaredescribed throughout this document can be applied to other fields. Muchof the security-related disclosure, for example, can be applied to manyother fields that require secure communications between entities, andcertainly has application far beyond the field of printers.

[0468] System Overview

[0469] The preferred of the present invention is implemented in aprinter using microelectromechanical systems (MEMS) printheads. Theprinter can receive data from, for example, a personal computer such asan IBM compatible PC or Apple computer. In other embodiments, theprinter can receive data directly from, for example, a digital still orvideo camera. The particular choice of communication link is notimportant, and can be based, for example, on USB, Firewire, Bluetooth orany other wireless or hardwired communications protocol.

[0470] Print System Overview

[0471] 3 Introduction

[0472] This document describes the SoPEC (Small office home office PrintEngine Controller) ASIC (Application Specific Integrated Circuit)suitable for use in, for example, SoHo printer products. The SoPEC ASICis intended to be a low cost solution for bi-lithic printhead control,replacing the multichip solutions in larger more professional systemswith a single chIP. The increased cost competitiveness is achieved byintegrating several systems such as a modified PEC1 printing pipeline,CPU control system, peripherals and memory sub-system onto one SoC ASIC,reducing component count and simplifying board design.

[0473] This section will give a general introduction to Memjet printingsystems, introduce the components that make a bi-lithic printheadsystem, describe possible system architectures and show how severalSoPECs can be used to achieve A3 and A4 duplex printing. The section“SoPEC ASIC” describes the SoC SoPEC ASIC, with subsections describingthe CPU, DRAM and Print Engine Pipeline subsystems. Each section gives adetailed description of the blocks used and their operation within theoverall print system. The final section describes the bi-lithicprinthead construction and associated implications to the system due toits makeup.

[0474] 4 Nomenclature

[0475] 4.1 bi-Lithic Printhead Notation

[0476] A bi-lithic based printhead is constructed from 2 printhead ICsof varying sizes. The notation M:N is used to express the sizerelationship of each IC, where M specifies one printhead IC in inchesand N specifies the remaining printhead IC in inches.

[0477] The ‘SoPEC/MoPEC Bilithic Printhead Reference’ document [10]contains a description of the bilithic printhead and relatedterminology.

[0478] 4.2 Definitions

[0479] The following terms are used throughout this specification:Bi-lithic Refers to printhead constructed printhead from 2 printhead ICsCPU Refers to CPU core, caching system and MMU. ISI-Bridge chip A devicewith a high speed interface (such as USB2.0, Ethernet or IEEE1394) andone or more ISI interfaces. The ISI-Bridge would be the ISIMaster foreach of the ISI buses it interfaces to. ISIMaster The ISIMaster is theonly device allowed to initiate communication on the Inter SopecInterface (ISI) bus. The ISIMaster interfaces with the host. ISISlaveMulti-SoPEC systems will contain one or more ISISlave SoPECs connect dto the ISI bus. ISISlaves can only respond to communication initiated bythe ISIMaster. LEON Refers to the LEON CPU core. LineSyncMaster TheLineSyncMaster device generates the line synchron- isation pulse thatall SoPECs in the system must synchronise their line outputs to.Multi-SoPEC Refers to SoPEC based print system with multiple SoPECdevices Netpage Refers to page printed with tags (normally in infraredink). PEC1 Refers to Print Engine Controller version 1, pre- cursor toSoPEC used to control printheads constructed from multiple angledprinthead segments. Printhead IC Single MEMS IC used to constructbi-lithic printhead PrintMaster The PrintMaster device is responsiblefor coordinating all aspects of the print operation. There may only beone PrintMaster in a system. QA Chip Quality Assurance Chip StorageSoPEC An ISISlave SoPEC used as a DRAM store and which does not print.Tag Refers to pattern which encodes information about its position andorientation which allow it to be optically located and its data contentsread.

[0480] 4.3 Acronym and Abbreviations

[0481] The following acronyms and abbreviations are used in thisspecification CFU Contone FIFO Unit CPU Central Processing Unit DIU DRAMInterface Unit DNC Dead Nozzle Compensator DRAM Dynamic Random AccessMemory DWU DotLine Writer Unit GPIO General Purpose Input Output HCUHalftoner Compositor Unit ICU Interrupt Controller Unit ISI Inter SoPECInterface LDB Lossless Bi-level Decoder LLU Line Loader Unit LSS LowSpeed Serial interface MEMS Micro Electro Mechanical System MMU MemoryManagement Unit PCU SoPEC Controller Unit PHI PrintHead Interface PSSPower Save Storage Unit RDU Real-time Debug Unit ROM Read Only MemorySCB Serial Communication Block SFU Spot FIFO Unit SMG4 SilverbrookModified Group 4. SoPEC Small office home office Print Engine ControllerSRAM Static Random Access Memory TE Tag Encoder TFU Tag FIFO Unit TIMTimers Unit USB Universal Serial Bus

[0482] 4.4 Pseudocode Notation

[0483] In general the pseudocode examples use C like statements withsome exceptions. Symbol and naming convections used for pseudocode. //Comment = Assignment ==, !=, <, > Operator equal, not equal, less than,greater than +, −, *, /, % Operator addition, subtraction, multiply,divide, modulus &, |, {circumflex over ( )}, <<, >>, ˜ Bitwise AND,bitwise OR, bitwise exclusive OR, left shift, right shift, complementAND, OR, NOT Logical AND, Logical OR, Logical inversion [XX:YY]Array/vector specifier {a, b, c} Concatenation operation ++, −−Increment and decrement

[0484] 4.4.1 Register and Signal Naming Conventions

[0485] In general register naming uses the C style conventions withcapitalization to denote word delimiters. Signals use RTL style notationwhere underscore denote word delimiters. There is a direct translationbetween both convention. For example the CmdSourceFifo register isequivalent to cmd_source_fifo signal.

[0486] 4.5 State Machine Notation

[0487] State machines should be described using the pseudocode notationoutlined above. State machine descriptions use the convention ofunderline to indicate the cause of a transition from one state toanother and plain text (no underline) to indicate the effect of thetransition i.e. signal transitions which occur when the new state isentered.

[0488] A sample state machine is shown in FIG. 1.

[0489] 5 Printing Considerations

[0490] A bi-lithic printhead produces 1600 dpi bi-level dots. Onlow-diffusion paper, each ejected drop forms a 22.5 μm diameter dot.Dots are easily produced in isolation, allowing dispersed-dot ditheringto be exploited to its fullest. Since the bi-lithic printhead is thewidth of the page and operates with a constant paper velocity, colorplanes are printed in perfect registration, allowing ideal dot-on-dotprinting. Dot-on-dot printing minimizes ‘muddying’ of midtones caused byinter-color bleed. A page layout may contain a mixture of images,graphics and text. Continuous-tone (contone) images and graphics arereproduced using a stochastic dispersed-dot dither. Unlike aclustered-dot (or amplitude-modulated) dither, a dispersed-dot (orfrequency-modulated) dither reproduces high spatial frequencies (i.e.image detail) almost to the limits of the dot resolution, whilesimultaneously reproducing lower spatial frequencies to their full colordepth, when spatially integrated by the eye. A stochastic dither matrixis carefully designed to be free of objectionable low-frequency patternswhen tiled across the image. As such its size typically exceeds theminimum size required to support a particular number of intensity levels(e.g. 16×16×8 bits for 257 intensity levels).

[0491] Human contrast sensitivity peaks at a spatial frequency of about3 cycles per degree of visual field and then falls off logarithmically,decreasing by a factor of 100 beyond about 40 cycles per degree andbecoming immeasurable beyond 60 cycles per degree [25][25]. At a normalviewing distance of 12 inches (about 300 mm), this translates roughly to200-300 cycles per inch (cpi) on the printed page, or 400-600 samplesper inch according to Nyquist's theorem.

[0492] In practice, contone resolution above about 300 ppi is of limitedutility outside special applications such as medical imaging. Offsetprinting of magazines, for example, uses contone resolutions in therange 150 to 300 ppi. Higher resolutions contribute slightly to colorerror through the dither.

[0493] Black text and graphics are reproduced directly using bi-levelblack dots, and are therefore not anti-aliased (i.e. low-pass filtered)before being printed. Text should therefore be supersampled beyond theperceptual limits discussed above, to produce smoother edges whenspatially integrated by the eye. Text resolution up to about 1200 dpicontinues to contribute to perceived text sharpness (assuminglow-diffusion paper, of course).

[0494] A Netpage printer, for example, may use a contone resolution of267 ppi (i.e. 1600 dpi/6), and a black text and graphics resolution of800 dpi. A high end office or departmental printer may use a contoneresolution of 320 ppi (1600 dpi/5) and a black text and graphicsresolution of 1600 dpi. Both formats are capable of exceeding thequality of commercial (offset) printing and photographic reproduction.

[0495] 6 Document Data Flow

[0496] 6.1 Considerations

[0497] Because of the page-width nature of the bi-lithic printhead, eachpage must be printed at a constant speed to avoid creating visibleartifacts. This means that the printing speed can't be varied to matchthe input data rate. Document rasterization and document printing aretherefore decoupled to ensure the printhead has a constant supply ofdata. A page is never printed until it is fully rasterized. This can beachieved by storing a compressed version of each rasterized page imagein memory. This decoupling also allows the RIP(s) to run ahead of theprinter when rasterizing simple pages, buying time to rasterize morecomplex pages.

[0498] Because contone color images are reproduced by stochasticdithering, but black text and line graphics are reproduced directlyusing dots, the compressed page image format contains a separateforeground bi-level black layer and background contone color layer. Theblack layer is composited over the contone layer after the contone layeris dithered (although the contone layer has an optional blackcomponent). A final layer of Netpage tags (in infrared or black ink) isoptionally added to the page for printout.

[0499]FIG. 2 shows the flow of a document from computer system toprinted page.

[0500] At 267 ppi for example, a A4 page (8.26 inches×11.7 inches) ofcontone CMYK data has a size of 26.3 MB. At 320 ppi, an A4 page ofcontone data has a size of 37.8 MB. Using lossy contone compressionalgorithms such as JPEG [27], contone images compress with a ratio up to10:1 without noticeable loss of quality, giving compressed page sizes of2.63 MB at 267 ppi and 3.78 MB at 320 ppi.

[0501] At 800 dpi, a A4 page of bi-level data has a size of 7.4 MB. At1600 dpi, a Letter page of bi-level data has a size of 29.5 MB. Coherentdata such as text compresses very well. Using lossless bi-levelcompression algorithms such as SMG4 fax as discussed in Section8.1.2.3.1, ten-point plain text compresses with a ratio of about 50:1.Lossless bi-level compression across an average page is about 20:1 with10:1 possible for pages which compress poorly. The requirement for SoPECis to be able to print text at 10:1 compression. Assuming 10:1compression gives compressed page sizes of 0.74 MB at 800 dpi, and 2.95MB at 1600 dpi.

[0502] Once dithered, a page of CMYK contone image data consists of 116MB of bi-level data. Using lossless bi-level compression algorithms onthis data is pointless precisely because the optimal dither isstochastic—i.e. since it introduces hard-to-compress disorder.

[0503] Netpage tag data is optionally supplied with the page image.Rather than storing a compressed bi-level data layer for the Netpagetags, the tag data is stored in its raw form. Each tag is supplied up to120 bits of raw variable data (combined with up to 56 bits of raw fixeddata) and covers up to a 6 mm×6 mm area (at 1600 dpi). The absolutemaximum number of tags on a A4 page is 15,540 when the tag is only 2mm×2 mm (each tag is 126 dots×126 dots, for a total coverage of 148tags×105 tags). 15,540 tags of 128 bits per tag gives a compressed tagpage size of 0.24 MB.

[0504] The multi-layer compressed page image format therefore exploitsthe relative strengths of lossy JPEG contone image compression, losslessbi-level text compression, and tag encoding. The format is compactenough to be storage-efficient, and simple enough to allowstraightforward real-time expansion during printing.

[0505] Since text and images normally don't overlap, the normalworst-case page image size is image only, while the normal best-casepage image size is text only. The addition of worst case Netpage tagsadds 0.24 MB to the page image size. The worst-case page image size istext over image plus tags. The average page size assumes a quarter of anaverage page contains images. Table 1 shows data sizes for compressedLetter page for these different options. TABLE 1 Data sizes for A4 page(8.26 inches × 11.7 inches) 267 ppi 320 ppi contone contone 800 dpi 1600dpi bi-level bi-level Image only (contone), 10:1 compression 2.63 MB3.78 MB Text only (bi-level), 10:1 compression 0.74 MB 2.95 MB Netpagetags, 1600 dpi 0.24 MB 0.24 MB Worst case (text + image + tags) 3.61 MB6.67 MB Average (text + 25% image + tags) 1.64 MB 4.25 MB

[0506] 6.2 Document Data Flow

[0507] The Host PC rasterizes and compresses the incoming document on apage by page basis. The page is restructured into bands with one or morebands used to construct a page. The compressed data is then transferredto the SoPEC device via the USB link. A complete band is stored in SoPECembedded memory. Once the band transfer is complete the SoPEC devicereads the compressed data, expands the band, normalizes contone,bi-level and tag data to 1600 dpi and transfers the resultant calculateddots to the bi-lithic printhead.

[0508] The document data flow is

[0509] The RIP software rasterizes each page description and compressthe rasterized page image.

[0510] The infrared layer of the printed page optionally containsencoded Netpage [5] tags at a programmable density.

[0511] The compressed page image is transferred to the SoPEC device viathe USB normally on a band by band basis.

[0512] The print engine takes the compressed page image and starts thepage expansion.

[0513] The first stage page expansion consists of 3 operations performedin parallel

[0514] expansion of the JPEG-compressed contone layer

[0515] expansion of the SMG4 fax compressed bi-level layer

[0516] encoding and rendering of the bi-level tag data.

[0517] The second stage dithers the contone layer using a programmabledither matrix, producing up to four bi-level layers at full-resolution.

[0518] The second stage then composites the bi-level tag data layer, thebi-level SMG4 fax de-compressed layer and up to four bi-level JPEGde-compressed layers into the full-resolution page image.

[0519] A fixative layer is also generated as required.

[0520] The last stage formats and prints the bi-level data through thebi-lithic printhead via the printhead interface.

[0521] The SoPEC device can print a full resolution page with 6 colorplanes. Each of the color planes can be generated from compressed datathrough any channel (either JPEG compressed, bi-level SMG4 faxcompressed, tag data generated, or fixative channel created) with amaximum number of 6 data channels from page RIP to bi-lithic printheadcolor planes.

[0522] The mapping of data channels to color planes is programmable,this allows for multiple color planes in the printhead to map to thesame data channel to provide for redundancy in the printhead to assistdead nozzle compensation.

[0523] Also a data channel could be used to gate data from another datachannel. For example in stencil mode, data from the bilevel data channelat 1600 dpi can be used to filter the contone data channel at 320 dpi,giving the effect of 1600 dpi contone image.

[0524] 6.3 Page Considerations Due to SoPEC

[0525] The SoPEC device typically stores a complete page of documentdata on chIP. The amount of storage available for compressed pages islimited to 2 Mbytes, imposing a fixed maximum on compressed page size. Acomparison of the compressed image sizes in Table 2 indicates that SoPECwould not be capable of printing worst case pages unless they are splitinto bands and printing commences before all the bands for the page havebeen downloaded. The page sizes in the table are shown for comparisonpurposes and would be considered reasonable for a professional levelprinting system. The SoPEC device is aimed at the consumer level andwould not be required to print pages of that complexity. Target documenttypes for the SoPEC device are shown Table 2. TABLE 2 Page contenttargets for SoPEC Size Page Content Description Calculation (MByte) BestCase picture Image, 8.26 × 11.7 × 267 × 267 × 3 1.97 267 ppi with 3colors, @ 10:1 A4 size Full page text, 800 dpi 8.26 × 11.7 × 800 × 0.74A4 size 800 @ 10:1 Mixed Graphics and Text 6 × 4 × 267 × 267 × 1.55Image of 6 inches × 4 3 @ 5:1 inches @ 267 ppi and 800 × 800 × 73 @ 10:13 colors Remaining area text ˜73 inches², 800 dpi Best Case Photo, 3Colors, 6.6 Mpixel @ 10:1 2.00 6.6 MegaPixel Image

[0526] If a document with more complex pages is required, the page RIPsoftware in the host PC can determine that there is insufficient memorystorage in the SoPEC for that document. In such cases the RIP softwarecan take two courses of action. It can increase the compression ratiountil the compressed page size will fit in the SoPEC device, at theexpense of document quality, or divide the page into bands and allowSoPEC to begin printing a page band before all bands for that page aredownloaded. Once SoPEC starts printing a page it cannot stop, if SoPECconsumes compressed data faster than the bands can be downloaded abuffer underrun error could occur causing the print to fail. A bufferunderrun occurs if a line synchronisation pulse is received before aline of data has been transferred to the printhead.

[0527] Other options which can be considered if the page does not fitcompletely into the compressed page store are to slow the printing or touse multiple SoPECs to print parts of the page. A Storage SoPEC (Section7.2.5) could be added to the system to provide guaranteed bandwidth datadelivery. The print system could also be constructed using an ISI-Bridgechip (Section 7.2.6) to provide guaranteed data delivery.

[0528] 7 Memjet Printer Architecture

[0529] The SoPEC device can be used in several printer configurationsand architectures.

[0530] In the general sense every SoPEC based printer architecture willcontain:

[0531] One or more SoPEC devices.

[0532] One or more bi-lithic printheads.

[0533] Two or more LSS busses.

[0534] Two or more QA chips.

[0535] USB 1.1 connection to host or ISI connection to Bridge ChIP.

[0536] ISI bus connection between SoPECs (when multiple SoPECs areused).

[0537] Some example printer configurations as outlined in Section 7.2.The various system components are outlined briefly in Section 7.1.

[0538] 7.1 System Components

[0539] 7.1.1 SoPEC Print Engine Controller

[0540] The SoPEC device contains several system on a chip (SoC)components, as well as the print engine pipeline control applicationspecific logic.

[0541] 7.1.1.1 Print Engine Pipeline (PEP) Logic

[0542] The PEP reads compressed page store data from the embeddedmemory, optionally decompresses the data and formats it for sending tothe printhead. The print engine pipeline functionality includesexpanding the page image, dithering the contone layer, compositing theblack layer over the contone layer, rendering of Netpage tags,compensation for dead nozzles in the printhead, and sending theresultant image to the bi-lithic printhead.

[0543] 7.1.1.2 Embedded CPU

[0544] SoPEC contains an embedded CPU for general purpose systemconfiguration and management. The CPU performs page and band headerprocessing, motor control and sensor monitoring (via the GPIO) and othersystem control functions. The CPU can perform buffer management orreport buffer status to the host. The CPU can optionally run vendorapplication specific code for general print control such as paper readymonitoring and LED status update.

[0545] 7.1.1.3 Embedded Memory Buffer

[0546] A 2.5 Mbyte embedded memory buffer is integrated onto the SoPECdevice, of which approximately 2 Mbytes are available for compressedpage store data. A compressed page is divided into one or more bands,with a number of bands stored in memory. As a band of the page isconsumed by the PEP for printing a new band can be downloaded. The newband may be for the current page or the next page.

[0547] Using banding it is possible to begin printing a page before thecomplete compressed page is downloaded, but care must be taken to ensurethat data is always available for printing or a buffer underrun mayoccur.

[0548] An Storage SoPEC acting as a memory buffer (Section 7.2.5) or anISI-Bridge chip with attached DRAM (Section 7.2.6) could be used toprovide guaranteed data delivery.

[0549] 7.1.1.4 Embedded USB 1.1 Device

[0550] The embedded USB 1.1 device accepts compressed page data andcontrol commands from the host PC, and facilitates the data transfer toeither embedded memory or to another SoPEC device in multi-SoPECsystems.

[0551] 7.1.2 Bi-lithic Printhead

[0552] The printhead is constructed by abutting 2 printhead ICstogether. The printhead ICs can vary in size from 2 inches to 8 inches,so to produce an A4 printhead several combinations are possible. Forexample two printhead ICs of 7 inches and 3 inches could be used tocreate a A4 printhead (the notation is 7:3). Similarly 6 and 4combination (6:4), or 5:5 combination. For an A3 printhead it can beconstructed from 8:6 or an 7:7 printhead IC combination. Forphotographic printing smaller printheads can be constructed.

[0553] 7.1.3 LSS Interface Bus

[0554] Each SoPEC device has 2 LSS system buses for communication withQA devices for system authentication and ink usage accounting. Thenumber of QA devices per bus and their position in the system isunrestricted with the exception that PRINTER_QA and INK_QA devicesshould be on separate LSS busses.

[0555] 7.1.4 QA Devices

[0556] Each SoPEC system can have several QA devices. Normally eachprinting SoPEC will have an associated PRINTER_QA. Ink cartridges willcontain an INK_QA chIP. PRINTER_QA and INK_QA devices should be onseparate LSS busses. All QA chips in the system are physically identicalwith flash memory contents defining PRINTER_QA from INK_QA chIP.

[0557] 7.1.5 ISI Interface

[0558] The Inter-SoPEC Interface (ISI) provides a communication channelbetween SoPECs in a multi-SoPEC system. The ISIMaster can be SoPECdevice or an ISI-Bridge chip depending on the printer configuration.Both compressed data and control commands are transferred via theinterface.

[0559] 7.1.6 ISI-Bridge Chip

[0560] A device, other than a SoPEC with a USB connection, whichprovides print data to a number of slave SoPECs. A bridge chip willtypically have a high bandwidth connection, such as USB2.0, Ethernet orIEEE1394, to a host and may have an attached external DRAM forcompressed page storage. A bridge chip would have one or more ISIinterfaces. The use of multiple ISI buses would allow the constructionof independent print systems within the one printer. The ISI-Bridgewould be the ISIMaster for each of the ISI buses it interfaces to.

[0561] 7.2 Possible SoPEC Systems

[0562] Several possible SoPEC based system architectures exist. Thefollowing sections outline some possible architectures. It is possibleto have extra SoPEC devices in the system used for DRAM storage. The QAchip configurations shown are indicative of the flexibility of LSS busarchitecture, but not limited to those configurations.

[0563] 7.2.1 A4 Simplex with 1 SoPEC Device

[0564] In FIG. 3, a single SoPEC device can be used to control twoprinthead ICs. The SoPEC receives compressed data through the USB devicefrom the host. The compressed data is processed and transferred to theprinthead.

[0565] 7.2.2 A4 Duplex with 2 SoPEC Devices

[0566] In FIG. 4, two SoPEC devices are used to control two bi-lithicprintheads, each with two printhead ICs. Each bi-lithic printhead printsto opposite sides of the same page to achieve duplex printing. The SoPECconnected to the USB is the ISIMaster SoPEC, the remaining SoPEC is anISISlave. The ISIMaster receives all the compressed page data for bothSoPECs and re-distributes the compressed data over the Inter-SoPECInterface (ISI) bus.

[0567] It may not be possible to print an A4 page every 2 seconds inthis configuration since the USB 1.1 connection to the host may not haveenough bandwidth. An alternative would be for each SoPEC to have its ownUSB 1.1 connection. This would allow a faster average print speed.

[0568] 7.2.3 A3 Simplex with 2 SoPEC Devices

[0569] In FIG. 5, two SoPEC devices are used to control one A3 bi-lithicprinthead. Each SoPEC controls only one printhead IC (the remaining PHIport typically remains idle). This system uses the SoPEC with the USBconnection as the ISIMaster. In this dual SoPEC configuration thecompressed page store data is split across 2 SoPECs giving a total of 4Mbyte page store, this allows the system to use compression rates as inan A4 architecture, but with the increased page size of A3. TheISIMaster receives all the compressed page data for all SoPECs andre-distributes the compressed data over the Inter-SoPEC Interface (ISI)bus.

[0570] It may not be possible to print an A3 page every 2 seconds inthis configuration since the USB 1.1 connection to the host will onlyhave enough bandwidth to supply 2 Mbytes every 2 seconds. Pages whichrequire more than 2 MBytes every 2 seconds will therefore print moreslowly. An alternative would be for each SoPEC to have its own USB 1.1connection. This would allow a faster average print speed.

[0571] 7.2.4 A3 Duplex with 4 SoPEC Devices

[0572] In FIG. 6 a 4 SoPEC system is shown. It contains 2 A3 bi-lithicprintheads, one for each side of an A3 page. Each printhead contain 2printhead ICs, each printhead IC is controlled by an independent SoPECdevice, with the remaining PHI port typically unused. Again the SoPECwith USB 1.1 connection is the ISIMaster with the other SoPECs asISISlaves. In total, the system contains 8 Mbytes of compressed pagestore (2 Mbytes per SoPEC), so the increased page size does not degradethe system print quality, from that of an A4 simplex printer. TheISIMaster receives all the compressed page data for all SoPECs andre-distributes the compressed data over the Inter-SoPEC Interface (ISI)bus.

[0573] It may not be possible to print an A3 page every 2 seconds inthis configuration since the USB 1.1 connection to the host will onlyhave enough bandwidth to supply 2 Mbytes every 2 seconds. Pages whichrequire more than 2 MBytes every 2 seconds will therefore print moreslowly. An alternative would be for each SoPEC or set of SoPECs on thesame side of the page to have their own USB 1.1 connection (as ISISlavesmay also have direct USB connections to the host). This would allow afaster average print speed.

[0574] 7.2.5 SoPEC DRAM storage solution: A4 Simplex with 1 printingSoPEC and 1 memory SoPEC Extra SoPECs can be used for DRAM storage e.g.in FIG. 7 an A4 simplex printer can be built with a single extra SoPECused for DRAM storage. The DRAM SoPEC can provide guaranteed bandwidthdelivery of data to the printing SoPEC. SoPEC configurations can havemultiple extra SoPECs used for DRAM storage.

[0575] 7.2.6 ISI-Bridge Chip Solution: A3 Duplex System with 4 SoPECDevices

[0576] In FIG. 8, an ISI-Bridge chip provides slave-only ISI connectionsto SoPEC devices. FIG. 8 shows a ISI-Bridge chip with 2 separate ISIports. The ISI-Bridge chip is the ISIMaster on each of the ISI busses itis connected to. All connected SoPECs are ISISlaves. The ISI-Bridge chipwill typically have a high bandwidth connection to a host and may havean attached external DRAM for compressed page storage.

[0577] An alternative to having a ISI-Bridge chip would be for eachSoPEC or each set of SoPECs on the same side of a page to have their ownUSB 1.1 connection. This would allow a faster average print speed.

[0578] 8 Page Format and Printflow

[0579] When rendering a page, the RIP produces a page header and anumber of bands (a non-blank page requires at least one band) for apage. The page header contains high level rendering parameters, and eachband contains compressed page data. The size of the band will depend onthe memory available to the RIP, the speed of the RIP, and the amount ofmemory remaining in SoPEC while printing the previous band(s). FIG. 9shows the high level data structure of a number of pages with differentnumbers of bands in the page.

[0580] Each compressed band contains a mandatory band header, anoptional bi-level plane, optional sets of interleaved contone planes,and an optional tag data plane (for Netpage enabled applications). Sinceeach of these planes is optional¹, the band header specifies whichplanes are included with the band. FIG. 10 gives a high-level breakdownof the contents of a page band.

[0581] A single SoPEC has maximum rendering restrictions as follows:

[0582] 1 bi-level plane

[0583] 1 contone interleaved plane set containing a maximum of 4 contoneplanes

[0584] 1 tag data plane

[0585] a bi-lithic printhead with a maximum of 2 printhead ICs

[0586] The requirement for single-sided A4 single SoPEC printing is

[0587] average contone JPEG compression ratio of 10:1, with a localminimum compression ratio of 5:1 for a single line of interleaved JPEGblocks.

[0588] average bi-level compression ratio of 10:1, with a local minimumcompression ratio of 1:1 for a single line.

[0589] If the page contains rendering parameters that exceed thesespecifications, then the RIP or the Host PC must split the page into aformat that can be handled by a single SoPEC.

[0590] In the general case, the SoPEC CPU must analyze the page and bandheaders and generate an appropriate set of register write commands toconfigure the units in SoPEC for that page. The various bands are passedto the destination SoPEC(s) to locations in DRAM determined by the host.

[0591] The host keeps a memory map for the DRAM, and ensures that as aband is passed to a SoPEC, it is stored in a suitable free area in DRAM.Each SoPEC is connected to the ISI bus or USB bus via its Serialcommunication Block (SCB). The SoPEC CPU configures the SCB to allowcompressed data bands to pass from the USB or ISI through the SCB toSoPEC DRAM. FIG. 11 shows an example data flow for a page destined to beprinted by a single SoPEC. Band usage information is generated by theindividual SoPECs and passed back to the host.

[0592] SoPEC has an addressing mechanism that permits circular bandmemory allocation, thus facilitating easy memory management. However itis not strictly necessary that all bands be stored together. As long asthe appropriate registers in SoPEC are set up for each band, and a givenband is contiguous², the memory can be allocated in any way.

[0593] 8.1 Print Engine Example Page Format

[0594] This section describes a possible format of compressed pagesexpected by the embedded CPU in SoPEC. The format is generated bysoftware in the host PC and interpreted by embedded software in SoPEC.This section indicates the type of information in a page formatstructure, but implementations need not be limited to this format. Thehost PC can optionally perform the majority of the header processing.

[0595] The compressed format and the print engines are designed to allowreal-time page expansion during printing, to ensure that printing isnever interrupted in the middle of a page due to data underrun.

[0596] The page format described here is for a single black bi-levellayer, a contone layer, and a Netpage tag layer. The black bi-levellayer is defined to composite over the contone layer.

[0597] The black bi-level layer consists of a bitmap containing a 1-bitopacity for each pixel. This black layer matte has a resolution which isan integer or non-integer factor of the printer's dot resolution.

[0598] The highest supported resolution is 1600 dpi, i.e. the printer'sfull dot resolution.

[0599] The contone layer, optionally passed in as YCrCb, consists of a24-bit CMY or 32-bit CMYK color for each pixel. This contone image has aresolution which is an integer or non-integer factor of the printer'sdot resolution. The requirement for a single SoPEC is to support 1 sideper 2 seconds A4/Letter printing at a resolution of 267 ppi, i.e.one-sixth the printer's dot resolution.

[0600] Non-integer scaling can be performed on both the contone andbi-level images. Only integer scaling can be performed on the tag data.

[0601] The black bi-level layer and the contone layer are both incompressed form for efficient storage in the printer's internal memory.

[0602] 8.1.1 Page Structure

[0603] A single SoPEC is able to print with full edge bleed for Letterand A3 via different stitch part combinations of the bi-lithicprinthead. It imposes no margins and so has a printable page area whichcorresponds to the size of its paper. The target page size isconstrained by the printable page area, less the explicit (target) leftand top margins specified in the page description. These relationshipsare illustrated below.

[0604] 8.1.2 Compressed Page Format

[0605] Apart from being implicitly defined in relation to the printablepage area, each page description is complete and self-contained. Thereis no data stored separately from the page description to which the pagedescription refers.³ The page description consists of a page headerwhich describes the size and resolution of the page, followed by one ormore page bands which describe the actual page content.

[0606] 8.1.2.1 Page Header

[0607] Table 3 shows an example format of a page header. TABLE 3 Pageheader format field format description signature 16-bit Page headerformat integer signature. version 16-bit Page header format integerversion number. structure size 16-bit Size of page header. integer bandcount 16-bit Number of bands specified integer for this page. targetresolution 16-bit Resolution of target page. (dpi) integer This isalways 1600 for the Memjet printer. target page width 16-bit Width oftarget page, integer in dots. target page height 32-bit Height of targetpage, integer in dots. target left margin 16-bit Width of target leftmargin, for black and integer in dots, for black contone and contone.target top margin 16-bit Height of target top margin, for black andinteger in dots, for black contone and contone. target right 16-bitWidth of target right margin, margin for black integer in dots, forblack and contone and contone. target bottom 16-bit Height of targetbottom margin, margin for black integer in dots, for and contone alackand contone. target left 16-bit Width of target left margin, margin fortags integer in dots, for tags. target top 16-bit Height of target topmargin, margin for tags integer in dots, for tags. target right 16-bitWidth of target right margin, margin for tags integer in dots, for tags.target bottom 16-bit Height of target bottom margin for tags integermargin, in dots, for tags. generate tags 16-bit Specifies whether tointeger generate tags for this page (0 - no, 1 - yes). fixed tag data128-bit This is only valid if integer generate tags is set. tag vertical16-bit Scale factor in vertical scale factor integer direction from tagdata resolution to target resolution. Valid range = 1-511. Integerscaling only tag horizontal 16-bit Scale factor in horizontal scalefactor integer direction from tag data resolution to target resolution.Valid range = 1-511. Integer scaling only. bi-level layer 16-bit Scalefactor in vertical vertical scale factor integer direction from bi-levelresolution to target resolution (must be 1 or greater). May benon-integer. Expressed as a fraction with upper 8-bits the numerator andthe lower 8 bits the denominator. bi-level layer 16-bit Scale factor inhorizontal horizontal integer direction from bi-level scale factorresolution to target resolution (must be 1 or greater). May benon-integer. Expressed as a fraction with upper 8-bits the numerator andthe lower 8 bits the denominator. bi-level layer 16-bit Width ofbi-level layer page width integer page, in pixels. bi-level layer 32-bitHeight of bi-level layer page height integer page, in pixels. contoneflags 16 bit Defines the color conversion integer that is required forthe JPEG data. Bits 2-0 specify how many contone planes there are (e.g.3 for CMY and 4 for CMYK). Bit 3 specifies whether the first 3 colorplanes need to be converted back from YCrCb to CMY. Only valid if b2-0 =3 or 4. 0 - no conversion, leave JPEG colors alone 1 - color convert.Bits 7-4 specifies whether the YCrCb was generated directly from CMY, orwhether it was converted to RGB first via the step: R = 255-C, G =255-M, B = 255-Y. Each of the color planes can be individually inverted.Bit 4: 0 - do not invert color plane 0 1 - invert color plane 0 Bit 5:0 - do not invert color plane 1 1 - invert color plane 1 Bit 6: 0 - donot invert color plane 2 1 - invert color plane 2 Bit 7: 0 - do notinvert color plane 3 1 - invert color plane 3 Bit 8 specifies whetherthe contone data is JPEG compressed or non-compressed: 0 - JPEGcompressed 1 - non-compressed The remaining bits are reserved (0).contone vertical 16-bit Scale factor in vertical scale factor integerdirection from contone channel resolution to target resolution. Validrange = 1-255. May be non-integer. Expressed as a fraction with upper8-bits the numerator and the lower 8 bits the denominator. contone16-bit Scale factor in horizontal horizontal integer direction fromcontone channel scale factor resolution to target resolution. Validrange = 1-255. May be non- integer. Expressed as a fraction with upper8-bits the numerator and the lower 8 bits the denominator. contone page16-bit Width of contone page, width integer in contone pixels. contonepage 32-bit Height of contone page, height integer in contone pixels.reserved up to 128 Reserved and 0 pads out bytes page header to multipleof 128 bytes.

[0608] The page header contains a signature and version which allow theCPU to identify the page header format. If the signature and/or versionare missing or incompatible with the CPU, then the CPU can reject thepage.

[0609] The contone flags define how many contone layers are present,which typically is used for defining whether the contone layer is CMY orCMYK. Additionally, if the color planes are CMY, they can be optionallystored as YCrCb, and further optionally color space converted from CMYdirectly or via RGB. Finally the contone data is specified as beingeither JPEG compressed or non-compressed. The page header defines theresolution and size of the target page. The bi-level and contone layersare clipped to the target page if necessary. This happens whenever thebi-level or contone scale factors are not factors of the target pagewidth or height.

[0610] The target left, top, right and bottom margins define thepositioning of the target page within the printable page area.

[0611] The tag parameters specify whether or not Netpage tags should beproduced for this page and what orientation the tags should be producedat (landscape or portrait mode). The fixed tag data is also provided.

[0612] The contone, bi-level and tag layer parameters define the pagesize and the scale factors.

[0613] 8.1.2.2 Band Format

[0614] Table 4 shows the format of the page band header. TABLE 4 Bandheader format field format description signature 16-bit Page band headerinteger format signature. version 16-bit Page band header integer formatversion number. structure size 16-bit Size of page band integer header.bi-level layer 16-bit Height of bi-level band height integer layer band,in black pixels. bi-level layer 32-bit Size of bi-level band data sizeinteger layer band data, in bytes. contone band 16-bit Height of contoneheight integer band, in contone pixels. contone band 32-bit Size ofcontone data size integer plane band data, in bytes. tag band 16-bitHeight of tag band, height integer in dots. tag band 32-bit Size ofunencoded tag data size integer data band, in bytes. Can be 0 whichindicates that no tag data is provided. reserved up to 128 Reserved and0 pads bytes out band header to multiple of 128 bytes.

[0615] The bi-level layer parameters define the height of the blackband, and the size of its compressed band data. The variable-size blackdata follows the page band header.

[0616] The contone layer parameters define the height of the contoneband, and the size of its compressed page data. The variable-sizecontone data follows the black data.

[0617] The tag band data is the set of variable tag data half-lines asrequired by the tag encoder. The format of the tag data is found inSection 26.5.2. The tag band data follows the contone data.

[0618] Table 5 shows the format of the variable-size compressed banddata which follows the page band header. TABLE 5 Page band data formatfield format Description black data Modified G4 Compressed bi-levelfacsimile bitstream⁴ layer. contone data JPEG bytestream Compressedcontone datalayer. tag data map Tag data array Tag data format. SeeSection 26.5.2.

[0619] The start of each variable-size segment of band data should bealigned to a 256-bit DRAM word boundary.

[0620] The following sections describe the format of the compressedbi-level layers and the compressed contone layer. section 26.5.1 on page410 describes the format of the tag data structures.

[0621] 8.1.2.3 Bi-Level Data Compression

[0622] The (typically 1600 dpi) black bi-level layer is losslesslycompressed using Silverbrook Modified Group 4 (SMG4) compression whichis a version of Group 4 Facsimile compression [22] without Huffman andwith simplified run length encodings. Typically compression ratIOsexceed 10:1. The encoding are listed in Table 6 and Table 7. TABLE 6Bi-Level group 4 facsimile style compression encodings EncodingDescription same as Group 4 1000 Pass Command: a0

b2, skip next two edges Facsimile 1 Vertical(0): a0

b1, color = !color 110 Vertical(1): a0

b1 + 1, color = !color 010 Vertical(−1): a0

b1 − 1, color = !color 110000 Vertical(2): a0

b1 + 2, color = !color 010000 Vertical(−2): a0

b1 − 2, color = !color Unique to this 100000 Vertical(3): a0

b1 + 3, implementation color = !color 000000 Vertical(−3): a0

b1 − 3, color = !color <RL><RL>100 Horizontal: a0

a0 + <RL> + <RL>

[0623] SMG4 has a pass through mode to cope with local negativecompression. Pass through mode is activated by a special run-lengthcode. Pass through mode continues to either end of line or for apre-programmed number of bits, whichever is shorter. The specialrun-length code is always executed as a run-length code, followed bypass through. The pass through escape code is a medium length run-lengthwith a run of less than or equal to 31. TABLE 7 Run length (RL)encodings Encoding Description Unique to this RRRRR1 Short BlackRunlength implementation (5 bits) RRRRR1 Short White Runlength (5 bits)RRRRRRRRRR10 Medium Black Runlength (10 bits) RRRRRRRR10 Medium WhiteRunlength (8 bits) RRRRRRRRRR10 Medium Black Runlength with RRRRRRRRRR<= 31, Enter pass through RRRRRRRR10 Medium White Runlength withRRRRRRRR <= 31, Enter pass through RRRRRRRRRRRRRRR00 Long BlackRunlength (15 bits) RRRRRRRRRRRRRRR00 Long White Runlength (15 bits)

[0624] Since the compression is a bitstream, the encodings are readright (least significant bit) to left (most significant bit). The runlengths given as RRRR in Table are read in the same way (leastsignificant bit at the right to most significant bit at the left).

[0625] Each band of bi-level data is optionally self contained. Thefirst line of each band therefore is based on a ‘previous’ blank line orthe last line of the previous band.

[0626] 8.1.2.3.1 Group 3 and 4 Facsimile Compression

[0627] The Group 3 Facsimile compression algorithm [22] losslesslycompresses bi-level data for transmission over slow and noisy and noisytelephone lines. The bi-level data represents scanned black text andgraphics on a while background, and the algorithm is tuned for thisclass of images (it is explicitly not tuned, for example, for halftonedbi-level images). The 1D Group 3 algorithm runlength-encodes eachscanline and then Huffman-encodes the resulting runlengths. Runlengthsin the range 0 to 63 are coded with terminating codes. Runlengths in therange 64 to 2623 are coded with make-up codes, each representing amultiple of 64, followed by a terminating code. Runlengths exceeding2623 are coded with multiple make-up codes followed by a terminatingcode. The Huffman tables are fixed, but are separately tuned for blackand white runs (except for make-up codes above 1728, which are common).When possible, the 2D Group 3 algorithm encodes a scanline as a set ofshort edge deltas (0, +1, +2, +3) with reference to the previousscanline. The delta symbols are entropy-encoded (so that the zero deltasymbol is only one bit long etc.) Edges within a 2D-encoded line whichcan't be delta-encoded are runlength-encoded, and are identified by aprefix. 1D- and 2D-encoded lines are marked differently. 1D-encodedlines are generated at regular intervals, whether actually required ornot, to ensure that the decoder can recover from line noise with minimalimage degradation. 2D Group 3 achieves compression ratIOs of up to 6:1[32]. The Group 4 Facsimile algorithm [22] losslessly compressesbi-level data for transmission over error-free communications lines(i.e. the lines are truly error-free, or error-correction is done at alower protocol level). The Group 4 algorithm is based on the 2D Group 3algorithm, with the essential modification that since transmission isassumed to be error-free, 1D-encoded lines are no longer generated atregular intervals as an aid to error-recovery. Group 4 achievescompression ratIOs ranging from 20:1 to 60:1 for the CCITT set of testimages [32].

[0628] The design goals and performance of the Group 4 compressionalgorithm qualify it as a compression algorithm for the bi-level layers.However, its Huffman tables are tuned to a lower scanning resolution(100-400 dpi), and it encodes runlengths exceeding 2623 awkwardly.

[0629] 8.1.2.4 Contone Data Compression

[0630] The contone layer (CMYK) is either a non-compressed bytestream oris compressed to an interleaved JPEG bytestream. The JPEG bytestream iscomplete and self-contained. It contains all data required fordecompression, including quantization and Huffman tables.

[0631] The contone data is optionally converted to YCrCb before beingcompressed (there is no specific advantage in color-space converting ifnot compressing). Additionally, the CMY contone pixels are optionallyconverted (on an individual basis) to RGB before color conversion usingR=255-C, G=255-M, B=255-Y. Optional bitwise inversion of the K plane mayalso be performed. Note that this CMY to RGB conversion is not intendedto be accurate for display purposes, but rather for the purposes oflater converting to YCrCb. The inverse transform will be applied beforeprinting.

[0632] 8.1.2.4.1 JPEG Compression

[0633] The JPEG compression algorithm [27] lossily compresses a contoneimage at a specified quality level. It introduces imperceptible imagedegradation at compression ratIOs below 5:1, and negligible imagedegradation at compression ratIOs below 10:1 [33].

[0634] JPEG typically first transforms the image into a color spacewhich separates luminance and chrominance into separate color channels.This allows the chrominance channels to be subsampled withoutappreciable loss because of the human visual system's relatively greatersensitivity to luminance than chrominance. After this first step, eachcolor channel is compressed separately.

[0635] The image is divided into 8×8 pixel blocks. Each block is thentransformed into the frequency domain via a discrete cosine transform(DCT). This transformation has the effect of concentrating image energyin relatively lower-frequency coefficients, which allowshigher-frequency coefficients to be more crudely quantized. Thisquantization is the principal source of compression in JPEG. Furthercompression is achieved by ordering coefficients by frequency tomaximize the likelihood of adjacent zero coefficients, and thenrunlength-encoding runs of zeroes. Finally, the runlengths and non-zerofrequency coefficients are entropy coded. Decompression is the inverseprocess of compression.

[0636] 8.1.2.4.2 Non-Compressed Format

[0637] If the contone data is non-compressed, it must be in ablock-based format bytestream with the same pixel order as would beproduced by a JPEG decoder. The bytestream therefore consists of aseries of 8×8 block of the original image, starting with the top left8×8 block, and working horizontally across the page (as it will beprinted) until the top rightmost 8×8 block, then the next row of 8×8blocks (left to right) and so on until the lower row of 8×8 blocks (leftto right). Each 8×8 block consists of 64 8-bit pixels for color plane 0(representing 8 rows of 8 pixels in the order top left to bottom right)followed by 64 8-bit pixels for color plane 1 and so on for up to amaximum of 4 color planes.

[0638] If the original image is not a multiple of 8 pixels in X or Y,padding must be present (the extra pixel data will be ignored by thesetting of margins).

[0639] 8.1.2.4.3 Compressed Format

[0640] If the contone data is compressed the first memory band containsJPEG headers (including tables) plus MCUs (minimum coded units). Theratio of space between the various color planes in the JPEG stream is1:1:1:1. No subsampling is permitted. Banding can be completelyarbitrary i.e there can be multiple JPEG images per band or 1 JPEG imagedivided over multiple bands. The break between bands is only memoryalignment based.

[0641] 8.1.2.4.4 Conversion of RGB to YCrCb (in RIP)

[0642] YCrCb is defined as per CCIR 601-1 [24] except that Y, Cr and Cbare normalized to occupy all 256 levels of an 8-bit binary encoding andtake account of the actual hardware implementation of the inversetransform within SoPEC.

[0643] The exact color conversion computation is as follows:

[0644] Y*=(9805/32768)R+(19235/32768)G+(3728/32768)B

[0645] Cr*=(16375/32768)R−(13716/32768)G−(2659/32768)B+128

[0646] Cb*=−(5529/32768)R−(10846/32768)G+(16375/32768)B+128

[0647] Y, Cr and Cb are obtained by rounding to the nearest integer.There is no need for saturation since ranges of Y*, Cr* and Cb* afterrounding are [0-255], [1-255] and [1-255] respectively. Note that fullaccuracy is possible with 24 bits. See [14] for more information.

[0648] SoPEC ASIC

[0649] 9 Overview

[0650] The Small Office Home Office Print Engine Controller (SoPEC) is apage rendering engine ASIC that takes compressed page images as input,and produces decompressed page images at up to 6 channels of bi-leveldot data as output. The bi-level dot data is generated for the Memjetbi-lithic printhead. The dot generation process takes account ofprinthead construction, dead nozzles, and allows for fixativegeneration.

[0651] A single SoPEC can control 2 bi-lithic printheads and up to 6color channels at 10,000 lines/sec⁵, equating to 30 pages per minute. Asingle SoPEC can perform full-bleed printing of A3, A4 and Letter pages.The 6 channels of colored ink are the expected maximum in a consumerSOHO, or office Bi-lithic printing environment:

[0652] CMY, for regular color printing.

[0653] K, for black text, line graphics and gray-scale printing.

[0654] IR (infrared), for Netpage-enabled [5] applications.

[0655] F (fixative), to enable printing at high speed. Because thebi-lithic printer is capable of printing so fast, a fixative may berequired to enable the ink to dry before the page touches the pagealready printed. Otherwise the pages may bleed on each other. In lowspeed printing environments the fixative may not be required.

[0656] SoPEC is color space agnostic. Although it can accept contonedata as CMYX or RGBX, where X is an optional 4th channel, it also canaccept contone data in any print color space. Additionally, SoPECprovides a mechanism for arbitrary mapping of input channels to outputchannels, including combining dots for ink optimization, generation ofchannels based on any number of other channels etc. However, inputs aretypically CMYK for contone input, K for the bi-level input, and theoptional Netpage tag dots are typically rendered to an infra-red layer.A fixative channel is typically generated for fast printingapplications.

[0657] SoPEC is resolution agnostic. It merely provides a mappingbetween input resolutions and output resolutions by means of scalefactors. The expected output resolution is 1600 dpi, but SoPEC actuallyhas no knowledge of the physical resolution of the Bi-lithic printhead.

[0658] SoPEC is page-length agnostic. Successive pages are typicallysplit into bands and downloaded into the page store as each band ofinformation is consumed and becomes free.

[0659] SoPEC provides an interface for synchronization with otherSoPECs. This allows simple multi-SoPEC solutions for simultaneousA3/A4/Letter duplex printing. However, SoPEC is also capable of printingonly a portion of a page image. Combining synchronization functionalitywith partial page rendering allows multiple SoPECs to be readilycombined for alternative printing requirements including simultaneousduplex printing and wide format printing.

[0660] Table 8 lists some of the features and corresponding benefits ofSoPEC. TABLE 8 Features and Benefits of SoPEC Feature Benefits Optimisedprint 30 ppm full page photographic architecture in quality colorprinting from a hardware desktop PC 0.13 micron CMOS High speed (>3million Low cost transistors) High functionality 900 Million dotsExtremely fast page generation per second 10,000 lines per 0.5 A4/Letterpages per SoPEC second at 1600 dpi chip per second 1 chip drives up toLow cost page-width printers 133,920 nozzles 1 chip drives up to 6 99%of SoHo printers can use color planes 1 SoPEC device Integrated DRAM Noexternal memory required, leading to low cost systems Power saving SoPECcan enter a power saving sleep mode sleep mode to reduce powerdissipation between print jobs JPEG expansion Low bandwidth from PC Lowmemory requirements in printer Lossless bitplane High resolution textand line expansion art with low bandwidth from PC (e.g. over USB)Netpage tag expansion Generates interactive paper Stochastic dispersedOptically smooth image quality dot dither No moire effects Hardwarecompositor Pages composited in real-time for 6 image planes Dead nozzlecompensation Extends printhead life and yield Reduces printhead costColor space agnostic Compatible with all inksets and image sourcesincluding RGB, CMYK, spot, CIE L*a*b*, hexachrome, YCrCbK, sRGB andother Color space conversion Higher quality / lower bandwidth Computerinterface USB1.1 interface to host and ISI interface to ISI-Bridge chipthereby allowing connection to IEEE 1394, Bluetooth etc. Cascadable inresolution Printers of any resolution Cascadable in color depth Specialcolor sets e.g. hexachrome can be used Cascadable in image size Printersof any width up to 16 inches Cascadable in pages Printers can print bothsides simultaneously Cascadable in speed Higher speeds are possible byhaving each SoPEC print one vertical strip of the page. Fixative channelExtremely fast ink drying data generation without wastage Built-insecurity Revenue models are protected Undercolor removal on Reduced inkusage dot-by-dot basis Does not require fonts for No font substitutionor high speed operation missing fonts Flexible printhead Manyconfigurations of configuration printheads are supported by one chiptype Drives Bi-lithic No print driver chips required, printheadsdirectly results in lower cost Determines dot accurate Removes need forphysical ink ink usage monitoring system in ink cartridges

[0661] 9.1 Printing Rates

[0662] The required printing rate for SoPEC is 30 sheets per minute withan inter-sheet spacing of 4 cm. To achieve a 30 sheets per minute printrate, this requires:

[0663] 300 mm×63 (dot/mm)/2 sec=105.8 μseconds per line, with nointer-sheet gap.

[0664] 340 mm×63 (dot/mm)/2 sec=93.3 μseconds per line, with a 4 cminter-sheet gap.

[0665] A printline for an A4 page consists of 13824 nozzles across thepage [2]. At a system clock rate of 160 MHz 13824 dots of data can begenerated in 86.4 μseconds. Therefore data can be generated fast enoughto meet the printing speed requirement. It is necessary to deliver thisprint data to the print-heads.

[0666] Printheads can be made up of 5:5, 6:4, 7:3 and 8:2 inch printheadcombinations [2]. Print data is transferred to both print heads in apair simultaneously. This means the longest time to print a line isdetermined by the time to transfer print data to the longest printsegment. There are 9744 nozzles across a 7 inch printhead. The printdata is transferred to the printhead at a rate of 106 MHz (2/3 of thesystem clock rate) per color plane. This means that it will take 91.9 μsto transfer a single line for a 7:3 printhead configuration. So we canmeet the requirement of 30 sheets per minute printing with a 4 cm gapwith a 7:3 printhead combination. There are 11160 across an 8 inchprinthead. To transfer the data to the printhead at 106 MHz will take105.3 μs. So an 8:2 printhead combination printing with an inter-sheetgap will print slower than 30 sheets per minute.

[0667] 9.2 SoPEC Basic Architecture

[0668] From the highest point of view the SoPEC device consists of 3distinct subsystems

[0669] CPU Subsystem

[0670] DRAM Subsystem

[0671] Print Engine Pipeline (PEP) Subsystem

[0672] See FIG. 13 for a block level diagram of SoPEC.

[0673] 9.2.1 CPU Subsystem

[0674] The CPU subsystem controls and configures all aspects of theother subsystems. It provides general support for interfacing andsynchronising the external printer with the internal print engine. Italso controls the low speed communication to the QA chips. The CPUsubsystem contains various peripherals to aid the CPU, such as GPIO(includes motor control), interrupt controller, LSS Master and generaltimers. The Serial Communications Block (SCB) on the CPU subsystemprovides a full speed USB1.1 interface to the host as well as an InterSoPEC Interface (ISI) to other SoPEC devices.

[0675] 9.2.2 DRAM Subsystem

[0676] The DRAM subsystem accepts requests from the CPU, SerialCommunications Block (SCB) and blocks within the PEP subsystem. The DRAMsubsystem (in particular the DIU) arbitrates the various requests anddetermines which request should win access to the DRAM. The DIUarbitrates based on configured parameters, to allow sufficient access toDRAM for all requesters. The DIU also hides the implementation specificsof the DRAM such as page size, number of banks, refresh rates etc.

[0677] 9.2.3 Print Engine Pipeline (PEP) Subsystem

[0678] The Print Engine Pipeline (PEP) subsystem accepts compressedpages from DRAM and renders them to bi-level dots for a given print linedestined for a printhead interface that communicates directly with up to2 segments of a bi-lithic printhead.

[0679] The first stage of the page expansion pipeline is the CDU, LBDand TE. The CDU expands the JPEG-compressed contone (typically CMYK)layer, the LBD expands the compressed bi-level layer (typically K), andthe TE encodes Netpage tags for later rendering (typically in IR or Kink). The output from the first stage is a set of buffers: the CFU, SFU,and TFU. The CFU and SFU buffers are implemented in DRAM.

[0680] The second stage is the HCU, which dithers the contone layer, andcomposites position tags and the bi-level spot0 layer over the resultingbi-level dithered layer. A number of options exist for the way in whichcompositing occurs. Up to 6 channels of bi-level data are produced fromthis stage. Note that not all 6 channels may be present on theprinthead. For example, the printhead may be CMY only, with K pushedinto the CMY channels and IR ignored. Alternatively, the position tagsmay be printed in K if IR ink is not available (or for testingpurposes).

[0681] The third stage (DNC) compensates for dead nozzles in theprinthead by color redundancy and error diffusing dead nozzle data intosurrounding dots.

[0682] The resultant bi-level 6 channel dot-data (typically CMYK-IRF) isbuffered and written out to a set of line buffers stored in DRAM via theDWU.

[0683] Finally, the dot-data is loaded back from DRAM, and passed to theprinthead interface via a dot FIFO. The dot FIFO accepts data from theLLU at the system clock rate (pclk), while the PHI removes data from theFIFO and sends it to the printhead at a rate of ⅔ times the system clockrate (see Section 9.1).

[0684] 9.3 SoPEC Block Description

[0685] Looking at FIG. 13, the various units are described here insummary form: TABLE 9 Units within SoPEC Unit Subsystem Acronym UnitName Description DRAM DIU DRAM interface unit Provides the interface forDRAM read and write access for the various SoPEC units, CPU and the SCBblock. The DIU provides arbitration between competing units controlsDRAM access. DRAM Embedded DRAM 20 Mbits of embedded DRAM, CPU CPUCentral Processing CPU for system configuration and control Unit MMUMemory Management Limits access to certain memory address areas Unit inCPU user mode RDU Real-time Debug Unit Facilitates the observation ofthe contents of most of the CPU addressable registers in SoPEC inaddition to some pseudo-registers in realtime. TIM General TimerContains watchdog and general system timers LSS Low Speed Serial Lowlevel controller for interfacing with the QA Interfaces chips GPIOGeneral Purpose IOs General IO controller, with built-in Motor controlunit, LED pulse units and de-glitch circuitry ROM Boot ROM 16 KBytes ofSystem Boot ROM code ICU Interrupt Controller General Purpose interruptcontroller with Unit configurable priority, and masking. CPR Clock,Power and Central Unit for controlling and generating the Reset blocksystem clocks and resets and powerdown mechanisms PSS Power Save StorageStorage retained while system is powered down USB Universal Serial BusUSB device controller for interfacing with the Device host USB. ISIInter-SoPEC Interface ISI controller for data and control communicationwith other SoPEC's in a multi- SoPEC system SCB Serial CommunicationContains both the USB and ISI blocks. Block Print Engine PCU PEPcontroller Provides external CPU with the means to read Pipeline andwrite PEP Unit registers, and read and (PEP) write DRAM in single 32-bitchunks. CDU Contone decoder unit Expands JPEG compressed contone layerand writes decompressed contone to DRAM CFU Contone FIFO Unit Providesline buffering between CDU and HCU LBD Lossless Bi-level Expandscompressed bi-level layer. Decoder SFU Spot FIFO Unit Provides linebuffering between LBD and HCU TE Tag encoder Encodes tag data into lineof tag dots. TFU Tag FIFO Unit Provides tag data storage between TE andHCU HCU Halftoner compositor Dithers contone layer and composites thebi- unit level spot 0 and position tag dots. DNC Dead Nozzle Compensatesfor dead nozzles by color Compensator redundancy and error diffusingdead nozzle data into surrounding dots. DWU Dotline Writer Unit Writesout the 6 channels of dot data for a given printline to the line storeDRAM LLU Line Loader Unit Reads the expanded page image from line store,formatting the data appropriately for the bi-lithic printhead. PHIPrintHead Interface Is responsible for sending dot data to the bi-lithic printheads and for providing line synchronization betweenmultiple SoPECs. Also provides test interface to printhead such astemperature monitoring and Dead Nozzle Identification.

[0686] 9.4 Addressing Scheme in SoPEC

[0687] SoPEC must address

[0688] 20 Mbit DRAM.

[0689] PCU addressed registers in PEP.

[0690] CPU-subsystem addressed registers.

[0691] SoPEC has a unified address space with the CPU capable ofaddressing all CPU-subsystem and PCU-bus accessible registers (in PEP)and all locations in DRAM. The CPU generates byte-aligned addresses forthe whole of SoPEC.

[0692] 22 bits are sufficient to byte address the whole SoPEC addressspace.

[0693] 9.4.1 DRAM Addressing Scheme

[0694] The embedded DRAM is composed of 256-bit words. However theCPU-subsystem may need to write individual bytes of DRAM. Therefore itwas decided to make the DIU byte addressable. 22 bits are required tobyte address 20 Mbits of DRAM. Most blocks read or write 256-bit wordsof DRAM. Therefore only the top 17 bits i.e. bits 21 to 5 are requiredto address 256-bit word aligned locations.

[0695] The exceptions are

[0696] CDU which can write 64-bits so only the top 19 address bits i.e.bits 21-3 are required.

[0697] The CPU-subsystem always generates a 22-bit byte-aligned DIUaddress but it will send flags to the DIU indicating whether it is an 8,16 or 32-bit write.

[0698] All DIU accesses must be within the same 256-bit aligned DRAMword.

[0699] 9.4.2 PEP Unit DRAM Addressing

[0700] PEP Unit configuration registers which specify DRAM locationsshould specify 256-bit aligned DRAM addresses i.e. using address bits21:5. Legacy blocks from PEC1 e.g. the LBD and TE may need to specify64-bit aligned DRAM addresses if these reused blocks DRAM addressing isdifficult to modify. These 64-bit aligned addresses require address bits21:3. However, these 64-bit aligned addresses should be programmed tostart at a 256-bit DRAM word boundary.

[0701] Unlike PEC1, there are no constraints in SoPEC on dataorganization in DRAM except that all data structures must start on a256-bit DRAM boundary. If data stored is not a multiple of 256-bits thenthe last word should be padded.

[0702] 9.4.3 CPU Subsystem Bus Addressed Registers

[0703] The CPU subsystem bus supports 32-bit word aligned read and writeaccesses with variable access timings. See section 11.4 for more detailsof the access protocol used on this bus. The CPU subsystem bus does notcurrently support byte reads and writes but this can be added at a laterdate if required by imported IP.

[0704] 9.4.4 PCU Addressed Registers in PEP

[0705] The PCU only supports 32-bit register reads and writes for thePEP blocks. As the PEP blocks only occupy a subsection of the overalladdress map and the PCU is explicitly selected by the MMU when a PEPblock is being accessed the PCU does not need to perform a decode of thehigher-order address bits. See Table 11 for the PEP subsystem addressmap.

[0706] 9.5 SoPEC Memory Map

[0707] 9.5.1 Main Memory Map

[0708] The system wide memory map is shown in FIG. 14 below. The memorymap is discussed in detail in Section 11 11 Central Processing Unit(CPU).

[0709] 9.5.2 CPU-Bus Peripherals Address Map

[0710] The address mapping for the peripherals attached to the CPU-busis shown in Table 10 below. The MMU performs the decode ofcpu_adr[21:12] to generate the relevant cpu_block_select signal for eachblock. The addressed blocks decode however many of the lower order bitsof cpu_adr[11:2] are required to address all the registers within theblock. TABLE 10 CPU-bus peripherals address map Block_base AddressROM_base 0x0000_0000 MMU_base 0x0001_0000 TIM_base 0x0001_1000 LSS_base0x0001_2000 GPIO_base 0x0001_3000 SCB_base 0x0001_4000 ICU_base0x0001_5000 CPR_base 0x0001_6000 DIU_base 0x0001_7000 PSS_base0x0001_8000 Reserved 0x0001_9000 to 0x0001_FFFF PCU_base 0x0002_0000 to0x0002_BFFF

[0711] 9.5.3 PCU Mapped Registers (PEP Blocks) Address Map

[0712] The PEP blocks are addressed via the PCU. From FIG. 14, the PCUmapped registers are in the range 0x0002_(—)0000 to 0x0002_BFFF. FromTable 11 it can be seen that there are 12 sub-blocks within the PCUaddress space. Therefore, only four bits are necessary to address eachof the sub-blocks within the PEP part of SoPEC. A further 12 bits may beused to address any configurable register within a PEP block. This givesscope for 1024 configurable registers per sub-block (the PCU mappedregisters are all 32-bit addressed registers so the upper 10 bits arerequired to individually address them). This address will come eitherfrom the CPU or from a command stored in DRAM. The bus is assembled asfollows:

[0713] address[15:12]=sub-block address,

[0714] address[n:2]=register address within sub-block, only the numberof bits required to decode the registers within each sub-block are used,

[0715] address[1:0]=byte address, unused as PCU mapped registers are all32-bit addressed registers.

[0716] So for the case of the HCU, its addresses range from 0x7000 to0x7FFF within the PEP subsystem or from 0x0002_(—)7000 to 0x0002_(—)7FFFin the overall system. TABLE 11 PEP blocks address map Block_baseAddress PCU_base 0x0002_0000 CDU_base 0x0002_1000 CFU_base 0x0002_2000LBD_base 0x0002_3000 SFU_base 0x0002_4000 TE_base 0x0002_5000 TFU_base0x0002_6000 HCU_base 0x0002_7000 DNC_base 0x0002_8000 DWU_base0x0002_9000 LLU_base 0x0002_A000 PHI_base 0x0002_B000 to 0x0002_BFFF

[0717] 9.6 Buffer Management in SoPEC

[0718] As outlined in Section 9.1, SoPEC has a requirement to print 1side every 2 seconds i.e. 30 sides per minute.

[0719] 9.6.1 Page Buffering

[0720] Approximately 2 Mbytes of DRAM are reserved for compressed pagebuffering in SoPEC. If a page is compressed to fit within 2 Mbyte then acomplete page can be transferred to DRAM before printing. However, thetime to transfer 2 Mbyte using USB 1.1 is approximately 2 seconds. Theworst case cycle time to print a page then approaches 4 seconds. Thisreduces the worst-case print speed to 15 pages per minute.

[0721] 9.6.2 Band Buffering

[0722] The SoPEC page-expansion blocks support the notion of pagebanding. The page can be divided into bands and another band can be sentdown to SoPEC while we are printing the current band. Therefore we canstart printing once at least one band has been downloaded.

[0723] The band size granularity should be carefully chosen to allowefficient use of the USB bandwidth and DRAM buffer space. It should besmall enough to allow seamless 30 sides per minute printing but not sosmall as to introduce excessive CPU overhead in orchestrating the datatransfer and parsing the band headers. Band-finish interrupts have beenprovided to notify the CPU of free buffer space. It is likely that thehost PC will supervise the band transfer and buffer management insteadof the SoPEC CPU.

[0724] If SoPEC starts printing before the complete page has beentransferred to memory there is a risk of a buffer underrun occurring ifsubsequent bands are not transferred to SoPEC in time e.g. due toinsufficient USB bandwidth caused by another USB peripheral consumingUSB bandwidth. A buffer underrun occurs if a line synchronisation pulseis received before a line of data has been transferred to the printheadand causes the print job to fail at that line. If there is no risk ofbuffer underrun then printing can safely start once at least one bandhas been downloaded.

[0725] If there is a risk of a buffer underrun occurring due to aninterruption of compressed page data transfer, then the safest approachis to only start printing once we have loaded up the data for a completepage. This means that a worst case latency in the region of 2 seconds(with USB1.1) will be incurred before printing the first page.Subsequent pages will take 2 seconds to print giving us the requiredsustained printing rate of 30 sides per minute.

[0726] A Storage SoPEC (Section 7.2.5) could be added to the system toprovide guaranteed bandwidth data delivery. The print system could alsobe constructed using an ISI-Bridge chip (Section 7.2.6) to provideguaranteed data delivery.

[0727] The most efficient page banding strategy is likely to bedetermined on a per page/print job basis and so SoPEC will support theuse of bands of any size.

[0728] 10 SoPEC Use Cases

[0729] 10.1 Introduction

[0730] This chapter is intended to give an overview of a representativeset of scenarIOs or use cases which SoPEC can perform. SoPEC is by nomeans restricted to the particular use cases described and not everySoPEC system is considered here.

[0731] In this chapter we discuss SoPEC use cases under four headings:

[0732] 1) Normal operation use cases.

[0733] 2) Security use cases.

[0734] 3) Miscellaneous use cases.

[0735] 4) Failure mode use cases.

[0736] Use cases for both single and multi-SoPEC systems are outlined.

[0737] Some tasks may be composed of a number of sub-tasks.

[0738] The realtime requirements for SoPEC software tasks are discussedin “11 Central Processing Unit (CPU)” under Section 11.3 Realtimerequirements.

[0739] 10.2 Normal Operation in a single SoPEC System with USB HostConnection

[0740] SoPEC operation is broken up into a number of sections which areoutlined below. Buffer management in a SoPEC system is normallyperformed by the host.

[0741] 10.2.1 Powerup

[0742] Powerup describes SoPEC initialisation following an externalreset or the watchdog timer system reset.

[0743] A typical powerup sequence is:

[0744] 1) Execute reset sequence for complete SoPEC.

[0745] 2) CPU boot from ROM.

[0746] 3) Basic configuration of CPU peripherals, SCB and DIU. DRAMinitialisation. USB Wakeup.

[0747] 4) Download and authentication of program (see Section 10.5.2).

[0748] 5) Execution of program from DRAM.

[0749] 6) Retrieve operating parameters from PRINTER_QA and authenticateoperating parameters.

[0750] 7) Download and authenticate any further datasets.

[0751] 10.2.2 USB Wakeup

[0752] The CPU can put different sections of SoPEC into sleep mode bywriting to registers in the CPR block (chapter 16). Normally the CPUsub-system and the DRAM will be put in sleep mode but the SCB andpower-safe storage (PSS) will still be enabled.

[0753] Wakeup describes SoPEC recovery from sleep mode with the SCB andpower-safe storage (PSS) still enabled. In a single SoPEC system, wakeupcan be initiated following a USB reset from the SCB.

[0754] A typical USB wakeup sequence is:

[0755] 1) Execute reset sequence for sections of SoPEC in sleep mode.

[0756] 2) CPU boot from ROM, if CPU-subsystem was in sleep mode.

[0757] 3) Basic configuration of CPU peripherals and DIU, and DRAMinitialisation, if required.

[0758] 4) Download and authentication of program using results inPower-Safe Storage (PSS) (see Section 10.5.2).

[0759] 5) Execution of program from DRAM.

[0760] 6) Retrieve operating parameters from PRINTER_QA and authenticateoperating parameters.

[0761] 7) Download and authenticate using results in PSS of any furtherdatasets (programs).

[0762] 10.2.3 Print Initialization

[0763] This sequence is typically performed at the start of a print jobfollowing powerup or wakeup:

[0764] 1) Check amount of ink remaining via QA chips.

[0765] 2) Download static data e.g. dither matrices, dead nozzle tablesfrom host to DRAM.

[0766] 3) Check printhead temperature, if required, and configureprinthead with firing pulse profile etc. accordingly.

[0767] 4) Initiate printhead pre-heat sequence, if required.

[0768] 10.2.4 First Page Download

[0769] Buffer management in a SoPEC system is normally performed by thehost.

[0770] First page, first band download and processing:

[0771] 1) The host communicates to the SoPEC CPU over the USB to checkthat DRAM space remaining is sufficient to download the first band.

[0772] 2) The host downloads the first band (with the page header) toDRAM.

[0773] 3) When the complete page header has been downloaded the SoPECCPU processes the page header, calculates PEP register commands andwrites directly to PEP registers or to DRAM.

[0774] 4) If PEP register commands have been written to DRAM, executePEP commands from DRAM via PCU.

[0775] Remaining bands download and processing:

[0776] 1) Check DRAM space remaining is sufficient to download the nextband.

[0777] 2) Download the next band with the band header to DRAM.

[0778] 3) When the complete band header has been downloaded, process theband header according to whichever band-related register updatingmechanism is being used.

[0779] 10.2.5 Start Printing

[0780] 1) Wait until at least one band of the first page has beendownloaded. One approach is to only start printing once we have loadedup the data for a complete page. If we start printing before thecomplete page has been transferred to memory we run the risk of a bufferunderrun occurring because compressed page data was not transferred toSoPEC in time e.g. due to insufficient USB bandwidth caused by anotherUSB peripheral consuming USB bandwidth.

[0781] 2) Start all the PEP Units by writing to their Go registers, viaPCU commands executed from DRAM or direct CPU writes. A rapid startuporder for the PEP units is outlined in Table 12. TABLE 12 Typical PEPUnit startup order for printing a page. Step# Unit 1 DNC 2 DWU 3 HCU 4PHI 5 LLU 6 CFU, SFU, TFU 7 CDU 8 TE, LBD

[0782] 3) Print ready interrupt occurs (from PHI).

[0783] 4) Start motor control, if first page, otherwise feed the nextpage. This step could occur before the print ready interrupt.

[0784] 5) Drive LEDs, monitor paper status.

[0785] 6) Wait for page alignment via page sensor(s) GPIO interrupt.

[0786] 7) CPU instructs PHI to start producing line syncs and hencecommence printing, or wait for an external device to produce line syncs.

[0787] 8) Continue to download bands and process page and band headersfor next page.

[0788] 10.2.6 Next Page(s) Download

[0789] As for first page download, performed during printing of currentpage.

[0790] 10.2.7 Between Bands

[0791] When the finished band flags are asserted band related registersin the CDU, LBD, TE need to be re-programmed before the subsequent bandcan be printed. This can be via PCU commands from DRAM. Typically only3-5 commands per decompression unit need to be executed. These registerscan also be reprogrammed directly by the CPU or most likely by updatingfrom shadow registers. The finished band flag interrupts the CPU to tellthe CPU that the area of memory associated with the band is now free.

[0792] 10.2.8 During Page Print

[0793] Typically during page printing ink usage is communicated to theQA chips.

[0794] 1) Calculate ink printed (from PHI).

[0795] 2) Decrement ink remaining (via QA chips).

[0796] 3) Check amount of ink remaining (via QA chips). This operationmay be better performed while the page is being printed rather than atthe end of the page.

[0797] 10.2.9 Page Finish

[0798] These operations are typically performed when the page isfinished:

[0799] 1) Page finished interrupt occurs from PHI.

[0800] 2) Shutdown the PEP blocks by de-asserting their Go registers. Atypical shutdown order is defined in Table 13. This will set the PEPUnit state-machines to their idle states without resetting theirconfiguration registers.

[0801] 3) Communicate ink usage to QA chips, if required. TABLE 13 Endof page shutdown order for PEP Units. Step# Unit 1 PHI (will shutdown byitself in the normal case at the end of a page) 2 DWU (shutting thisdown stalls the DNC and therefore the HCU and above) 3 LLU (shouldalready be halted due to PHI at end of last line of page) 4 TE (this isthe only dot supplier likely to be running, halted by the HCU) 5 CDU(this is likely to already be halted due to end of contone band) 6 CPU,SFU, TFU, LBD (order unimportant, and should already be halted due toend of band) 7 HCU, DNC (order unimportant, should already have halted)

[0802] 10.2.10 Start of Next Page

[0803] These operations are typically performed before printing the nextpage:

[0804] 1) Re-program the PEP Units via PCU command processing from DRAMbased on page header.

[0805] 2) Go to Start printing.

[0806] 10.2.11 End of Document

[0807] 1) Stop motor control.

[0808] 10.2.12 Sleep Mode

[0809] The CPU can put different sections of SoPEC into sleep mode bywriting to registers in the CPR block described in Section 16.

[0810] 1) Instruct host PC via USB that SoPEC is about to sleep.

[0811] 2) Store reusable authentication results in Power-Safe Storage(PSS).

[0812] 3) Put SoPEC into defined sleep mode.

[0813] 10.3 Normal Operation in a Multi-SoPEC System—ISIMaster SoPEC

[0814] In a multi-SoPEC system the host generally manages program andcompressed page download to all the SoPECs. Inter-SoPEC communication isover the ISI link which will add a latency. In the case of a multi-SoPECsystem with just one USB 1.1 connection, the SoPEC with the USBconnection is the ISIMaster. The ISI-bridge chip is the ISIMaster in thecase of an ISI-Bridge SoPEC configuration. While it is perfectlypossible for an ISISlave to have a direct USB connection to the host wedo not treat this scenario explicitly here to avoid possible confusion.

[0815] In a multi-SoPEC system one of the SoPECs will be thePrintMaster. This SoPEC must manage and control sensors and actuatorse.g. motor control. These sensors and actuators could be distributedover all the SoPECs in the system. An ISIMaster SoPEC may also be thePrintMaster SoPEC.

[0816] In a multi-SoPEC system each printing SoPEC will generally haveits own PRINTER_QA chip (or at least access to a PRINTER_QA chip thatcontains the SoPEC's SoPEC_id_key) to validate operating parameters andink usage. The results of these operations may be communicated to thePrintMaster SoPEC.

[0817] In general the ISIMaster may need to be able to:

[0818] Send messages to the ISISlaves which will cause the ISISlaves tosend their status to the ISIMaster.

[0819] Instruct the ISISlaves to perform certain operations.

[0820] As the ISI is an insecure interface commands issued over the ISIare regarded as user mode commands. Supervisor mode code running on theSoPEC CPUs will allow or disallow these commands. The software protocolneeds to be constructed with this in mind.

[0821] The ISIMaster will initiate all communication with the ISISlaves.

[0822] SoPEC operation is broken up into a number of sections which areoutlined below.

[0823] 10.3.1 Powerup

[0824] Powerup describes SoPEC initialisation following an externalreset or the watchdog timer system reset.

[0825] 1) Execute reset sequence for complete SoPEC.

[0826] 2) CPU boot from ROM.

[0827] 3) Basic configuration of CPU peripherals, SCB and DIU. DRAMinitialisation USB Wakeup

[0828] 4) SoPEC identification by activity on USB end-points 2-4indicates it is the ISIMaster (unless the SoPEC CPU has explicitlydisabled this function).

[0829] 5) Download and authentication of program (see Section 10.5.3).

[0830] 6) Execution of program from DRAM.

[0831] 7) Retrieve operating parameters from PRINTER_QA and authenticateoperating parameters.

[0832] 8) Download and authenticate any further datasets (programs).

[0833] 9) The initial dataset may be broadcast to all the ISISlaves.

[0834] 10) ISIMaster master SoPEC then waits for a short time to allowthe authentication to take place on the ISISlave SoPECs.

[0835] 11) Each ISISlave SoPEC is polled for the result of its programcode authentication process.

[0836] 12) If all ISISlaves report successful authentication the OEMcode module can be distributed and authenticated. OEM code will mostlikely reside on one SoPEC.

[0837] 10.3.2 USB Wakeup

[0838] The CPU can put different sections of SoPEC into sleep mode bywriting to registers in the CPR block [16]. Normally the CPU sub-systemand the DRAM will be put in sleep mode but the SCB and power-safestorage (PSS) will still be enabled.

[0839] Wakeup describes SoPEC recovery from sleep mode with the SCB andpower-safe storage (PSS) still enabled. For an ISIMaster SoPEC connectedto the host via USB, wakeup can be initiated following a USB reset fromthe SCB.

[0840] A typical USB wakeup sequence is:

[0841] 1) Execute reset sequence for sections of SoPEC in sleep mode.

[0842] 2) CPU boot from ROM, if CPU-subsystem was in sleep mode.

[0843] 3) Basic configuration of CPU peripherals and DIU, and DRAMinitialisation, if required.

[0844] 4) SoPEC identification by activity on USB end-points 2-4indicates it is the ISIMaster (unless the SoPEC CPU has explicitlydisabled this function).

[0845] 5) Download and authentication of program using results inPower-Safe Storage (PSS) (see Section 10.5.3).

[0846] 6) Execution of program from DRAM.

[0847] 7) Retrieve operating parameters from PRINTER_QA and authenticateoperating parameters.

[0848] 8) Download and authenticate any further datasets (programs)using results in Power-Safe Storage (PSS) (see Section 10.5.3).

[0849] 9) Following steps as per Powerup.

[0850] 10.3.3 Print Initialization

[0851] This sequence is typically performed at the start of a print jobfollowing powerup or wakeup:

[0852] 1) Check amount of ink remaining via QA chips which may bepresent on a ISISlave SoPEC.

[0853] 2) Download static data e.g. dither matrices, dead nozzle tablesfrom host to DRAM.

[0854] 3) Check printhead temperature, if required, and configureprinthead with firing pulse profile etc. accordingly. Instruct ISISlavesto also perform this operation.

[0855] 4) Initiate printhead pre-heat sequence, if required. InstructISISlaves to also perform this operation

[0856] 10.3.4 First Page Download

[0857] Buffer management in a SoPEC system is normally performed by thehost.

[0858] 1) The host communicates to the SoPEC CPU over the USB to checkthat DRAM space remaining is sufficient to download the first band.

[0859] 2) The host downloads the first band (with the page header) toDRAM.

[0860] 3) When the complete page header has been downloaded the SoPECCPU processes the page header, calculates PEP register commands andwrite directly to PEP registers or to DRAM.

[0861] 4) If PEP register commands have been written to DRAM, executePEP commands from DRAM via PCU.

[0862] Poll ISISlaves for DRAM status and download compressed data toISISlaves.

[0863] Remaining first page bands download and processing:

[0864] 1) Check DRAM space remaining is sufficient to download the nextband.

[0865] 2) Download the next band with the band header to DRAM.

[0866] 3) When the complete band header has been downloaded, process theband header according to whichever band-related register updatingmechanism is being used.

[0867] Poll ISISlaves for DRAM status and download compressed data toISISlaves.

[0868] 10.3.5 Start Printing

[0869] 1) Wait until at least one band of the first page has beendownloaded.

[0870] 2) Start all the PEP Units by writing to their Go registers, viaPCU commands executed from DRAM or direct CPU writes, in the suggestedorder defined in Table.

[0871] 3) Print ready interrupt occurs (from PHI). Poll ISISlaves untilprint ready interrupt.

[0872] 4) Start motor control (which may be on an ISISlave SoPEC), iffirst page, otherwise feed the next page. This step could occur beforethe print ready interrupt.

[0873] 5) Drive LEDS, monitor paper status (which may be on an ISISlaveSoPEC).

[0874] 6) Wait for page alignment via page sensor(s) GPIO interrupt(which may be on an ISISlave SoPEC).

[0875] 7) If the LineSyncMaster is a SoPEC its CPU instructs PHI tostart producing master line syncs. Otherwise wait for an external deviceto produce line syncs.

[0876] 8) Continue to download bands and process page and band headersfor next page.

[0877] 10.3.6 Next Page(s) Download

[0878] As for first page download, performed during printing of currentpage.

[0879] 10.3.7 Between Bands

[0880] When the finished band flags are asserted band related registersin the CDU, LBD and TE need to be re-programmed. This can be via PCUcommands from DRAM. Typically only 3-5 commands per decompression unitneed to be executed. These registers can also be reprogrammed directlyby the CPU or by updating from shadow registers. The finished band flaginterrupts to the CPU, tell the CPU that the area of memory associatedwith the band is now free.

[0881] 10.3.8 During Page Print

[0882] Typically during page printing ink usage is communicated to theQA chips.

[0883] 1) Calculate ink printed (from PHI).

[0884] 2) Decrement ink remaining (via QA chips).

[0885] 3) Check amount of ink remaining (via QA chips). This operationmay be better performed while the page is being printed rather than atthe end of the page.

[0886] 10.3.9 Page Finish

[0887] These operations are typically performed when the page isfinished:

[0888] 1) Page finished interrupt occurs from PHI. Poll ISISlaves forpage finished interrupts.

[0889] 2) Shutdown the PEP blocks by de-asserting their Go registers inthe suggested order in Table. This will set the PEP Unit state-machinesto their startup states.

[0890] 3) Communicate ink usage to QA chips, if required.

[0891] 10.3.10 Start of Next Page

[0892] These operations are typically performed before printing the nextpage:

[0893] 1) Re-program the PEP Units via PCU command processing from DRAMbased on page header.

[0894] 2) Go to Start printing.

[0895] 10.3.11 End of Document

[0896] 1) Stop motor control. This may be on an ISISlave SoPEC.

[0897] 10.3.12 Sleep Mode

[0898] The CPU can put different sections of SoPEC into sleep mode bywriting to registers in the CPR block [16]. This may be as a result of acommand from the host or as a result of a timeout.

[0899] 1) Inform host PC of which parts of SoPEC system are about tosleep.

[0900] 2) Instruct ISISlaves to enter sleep mode.

[0901] 3) Store reusable cryptographic results in Power-Safe Storage(PSS).

[0902] 4) Put ISIMaster SoPEC into defined sleep mode.

[0903] 10.4 Normal Operation in a Multi-SoPEC System—ISISlave SoPEC

[0904] This section the outline typical operation of an ISISlave SoPECin a multi-SoPEC system. The ISIMaster can be another SoPEC or anISI-Bridge chIP. The ISISlave communicates with the host either via theISIMaster or using a direct connection such as USB. For this use case weconsider only an ISISlave that does not have a direct host connection.Buffer management in a SoPEC system is normally performed by the host.

[0905] 10.4.1 Powerup

[0906] Powerup describes SoPEC initialisation following an externalreset or the watchdog timer system reset.

[0907] A typical powerup sequence is:

[0908] 1) Execute reset sequence for complete SoPEC.

[0909] 2) CPU boot from ROM.

[0910] 3) Basic configuration of CPU peripherals, SCB and DIU. DRAMinitialisation.

[0911] 4) Download and authentication of program (see Section 10.5.3).

[0912] 5) Execution of program from DRAM.

[0913] 6) Retrieve operating parameters from PRINTER_QA and authenticateoperating parameters.

[0914] 7) SoPEC identification by sampling GPIO pins to determine ISIId.Communicate ISIId to ISIMaster.

[0915] 8) Download and authenticate any further datasets.

[0916] 10.4.2 ISI Wakeup

[0917] The CPU can put different sections of SoPEC into sleep mode bywriting to registers in the CPR block [16]. Normally the CPU sub-systemand the DRAM will be put in sleep mode but the SCB and power-safestorage (PSS) will still be enabled.

[0918] Wakeup describes SoPEC recovery from sleep mode with the SCB andpower-safe storage (PSS) still enabled. In an ISISlave SoPEC, wakeup canbe initiated following an ISI reset from the SCB. A typical ISI wakeupsequence is:

[0919] 1) Execute reset sequence for sections of SoPEC in sleep mode.

[0920] 2) CPU boot from ROM, if CPU-subsystem was in sleep mode.

[0921] 3) Basic configuration of CPU peripherals and DIU, and DRAMinitialisation, if required.

[0922] 4) Download and authentication of program using results inPower-Safe Storage (PSS) (see Section 10.5.3).

[0923] 5) Execution of program from DRAM.

[0924] 6) Retrieve operating parameters from PRINTER_QA and authenticateoperating parameters.

[0925] 7) SoPEC identification by sampling GPIO pins to determine ISIId.Communicate ISIId to ISIMaster.

[0926] 8) Download and authenticate any further datasets.

[0927] 10.4.3 Print Initialization

[0928] This sequence is typically performed at the start of a print jobfollowing powerup or wakeup:

[0929] 1) Check amount of ink remaining via QA chips.

[0930] 2) Download static data e.g. dither matrices, dead nozzle tablesfrom ISI to DRAM.

[0931] 3) Check printhead temperature, if required, and configureprinthead with firing pulse profile etc. accordingly.

[0932] 4) Initiate printhead pre-heat sequence, if required.

[0933] 10.4.4 First Page Download

[0934] Buffer management in a SoPEC system is normally performed by thehost via the ISI.

[0935] 1) Check DRAM space remaining is sufficient to download the firstband.

[0936] 2) The host downloads the first band (with the page header) toDRAM via the ISI.

[0937] 3) When the complete page header has been downloaded, process thepage header, calculate PEP register commands and write directly to PEPregisters or to DRAM.

[0938] 4) If PEP register commands have been written to DRAM, executePEP commands from DRAM via PCU.

[0939] Remaining first page bands download and processing:

[0940] 1) Check DRAM space remaining is sufficient to download the nextband.

[0941] 2) The host downloads the first band (with the page header) toDRAM via the ISI.

[0942] 3) When the complete band header has been downloaded, process theband header according to whichever band-related register updatingmechanism is being used.

[0943] 10.4.5 Start Printing

[0944] 1) Wait until at least one band of the first page has beendownloaded.

[0945] 2) Start all the PEP Units by writing to their Go registers, viaPCU commands executed from DRAM or direct CPU writes, in the orderdefined in Table

[0946] 3) Print ready interrupt occurs (from PHI). Communicate toPrintMaster via ISI.

[0947] 4) Start motor control, if attached to this ISISlave, whenrequested by PrintMaster, if first page, otherwise feed next page. Thisstep could occur before the print ready interrupt

[0948] 5) Drive LEDS, monitor paper status, if on this ISISlave SoPEC,when requested by PrintMaster

[0949] 6) Wait for page alignment via page sensor(s) GPIO interrupt, ifon this ISISlave SoPEC, and send to PrintMaster.

[0950] 7) Wait for line sync and commence printing.

[0951] 8) Continue to download bands and process page and band headersfor next page.

[0952] 10.4.6 Next Page(s) Download

[0953] As for first band download, performed during printing of currentpage.

[0954] 10.4.7 Between Bands

[0955] When the finished band flags are asserted band related registersin the CDU, LBD and TE need to be re-programmed. This can be via PCUcommands from DRAM. Typically only 3-5 commands per decompression unitneed to be executed. These registers can also be reprogrammed directlyby the CPU or by updating from shadow registers. The finished band flaginterrupts to the CPU tell the CPU that the area of memory associatedwith the band is now free.

[0956] 10.4.8 During Page Print

[0957] Typically during page printing ink usage is communicated to theQA chips.

[0958] 1) Calculate ink printed (from PHI).

[0959] 2) Decrement ink remaining (via QA chips).

[0960] 3) Check amount of ink remaining (via QA chips). This operationmay be better performed while the page is being printed rather than atthe end of the page.

[0961] 10.4.9 Page Finish

[0962] These operations are typically performed when the page isfinished:

[0963] 1) Page finished interrupt occurs from PHI. Communicate pagefinished interrupt to PrintMaster.

[0964] 2) Shutdown the PEP blocks by de-asserting their Go registers inthe suggested order in Table. This will set the PEP Unit state-machinesto their startup states.

[0965] 3) Communicate ink usage to QA chips, if required.

[0966] 10.4.10 Start of Next Page

[0967] These operations are typically performed before printing the nextpage:

[0968] 1) Re-program the PEP Units via PCU command processing from DRAMbased on page header.

[0969] 2) Go to Start printing.

[0970] 10.4.11 End of Document

[0971] Stop motor control, if attached to this ISISlave, when requestedby PrintMaster.

[0972] 10.4.12 Powerdown

[0973] In this mode SoPEC is no longer powered.

[0974] 1) Powerdown ISISlave SoPEC when instructed by ISIMaster.

[0975] 10.4.13 Sleep

[0976] The CPU can put different sections of SoPEC into sleep mode bywriting to registers in the CPR block [16]. This may be as a result of acommand from the host or ISIMaster or as a result of a timeout.

[0977] 1) Store reusable cryptographic results in Power-Safe Storage(PSS).

[0978] 2) Put SoPEC into defined sleep mode.

[0979] 10.5 Security Use Cases

[0980] Please see the ‘SoPEC Security Overview’ [9] document for a morecomplete description of SoPEC security issues. The SoPEC boot operationis described in the ROM chapter of the SoPEC hardware designspecification, Section 17.2.

[0981] 10.5.1 Communication with the QA Chips

[0982] Communication between SoPEC and the QA chips (i.e. INK_QA andPRINTER_QA) will take place on at least a per power cycle and per pagebasis. Communication with the QA chips has three principal purposes:validating the presence of genuine QA chips (i.e the printer is usingapproved consumables), validation of the amount of ink remaining in thecartridge and authenticating the operating parameters for the printer.After each page has been printed, SoPEC is expected to communicate thenumber of dots fired per ink plane to the QA chipset. SoPEC may alsoinitiate decoy communications with the QA chips from time to time.

[0983] Process:

[0984] When validating ink consumption SoPEC is expected to principallyact as a conduit between the PRINTER_QA and INK_QA chips and to takecertain actions (basically enable or disable printing and report statusto host PC) based on the result. The communication channels are insecurebut all traffic is signed to guarantee authenticity.

[0985] Known Weaknesses

[0986] All communication to the QA chips is over the LSS interfacesusing a serial communication protocol. This is open to observation andso the communication protocol could be reverse engineered. In this caseboth the PRINTER_QA and INK_QA chips could be replaced by impostordevices (e.g. a single FPGA) that successfully emulated thecommunication protocol. As this would require physical modification ofeach printer this is considered to be an acceptably low risk. Anymessages that are not signed by one of the symmetric keys (such as theSoPEC_id_key) could be reverse engineered. The imposter device must alsohave access to the appropriate keys to crack the system.

[0987] If the secret keys in the QA chips are exposed or cracked thenthe system, or parts of it, is compromised.

[0988] Assumptions:

[0989] [1] The QA chips are not involved in the authentication ofdownloaded SoPEC code

[0990] [2] The QA chip in the ink cartridge (INK_QA) does not directlyaffect the operation of the cartridge in any way i.e. it does notinhibit the flow of ink etc.

[0991] [3] The INK_QA and PRINTER_QA chips are identical in their virginstate. They only become a INK_QA or PRINTER_QA after their FlashROM hasbeen programmed.

[0992] 10.5.2 Authentication of Downloaded Code in a Single SoPEC System

[0993] Process:

[0994] 1) SoPEC identification by activity on USB end-points 2-4indicates it is the ISIMaster (unless the SoPEC CPU has explicitlydisabled this function).

[0995] 2) The program is downloaded to the embedded DRAM.

[0996] 3) The CPU calculates a SHA-1 hash digest of the downloadedprogram.

[0997] 4) The ResetSrc register in the CPR block is read to determinewhether or not a power-on reset occurred.

[0998] 5) If a power-on reset occurred the signature of the downloadedcode (which needs to be in a known location such as the first or last Nbytes of the downloaded code) is decrypted using the Silverbrook publicboot0key stored in ROM. This decrypted signature is the expected SHA-1hash of the accompanying program. The encryption algorithm is likely tobe a public key algorithm such as RSA. If a power-on reset did not occurthen the expected SHA-1 hash is retrieved from the PSS and the computeintensive decryption is not required.

[0999] 6) The calculated and expected hash values are compared and ifthey match then the programs authenticity has been verified.

[1000] 7) If the hash values do not match then the host PC is notifiedof the failure and the SoPEC will await a new program download.

[1001] 8) If the hash values match then the CPU starts executing thedownloaded program.

[1002] 9) If, as is very likely, the downloaded program wishes todownload subsequent programs (such as OEM code) it is responsible forensuring the authenticity of everything it downloads. The downloadedprogram may contain public keys that are used to authenticate subsequentdownloads, thus forming a hierarchy of authentication. The SoPEC ROMdoes not control these authentications—it is solely concerned withverifying that the first program downloaded has come from a trustedsource.

[1003] 10) At some subsequent point OEM code starts executing. TheSilverbrook supervisor code acts as an O/S to the OEM user mode code.The OEM code must access most SoPEC functionality via system calls tothe Silverbrook code.

[1004] 11) The OEM code is expected to perform some simple ‘turn on thelights’ tasks after which the host PC is informed that the printer isready to print and the Start Printing use case comes into play.

[1005] Known Weaknesses:

[1006] If the Silverbrook private boot0key is exposed or cracked thenthe system is seriously compromised. A ROM mask change would be requiredto reprogram the boot0key.

[1007] 10.5.3 Authentication of Downloaded Code in a Multi-SoPEC System

[1008] 10.5.3.1 ISIMaster SoPEC Process:

[1009] 1) SoPEC identification by activity on USB end-points 2-4indicates it is the ISIMaster.

[1010] 2) The SCB is configured to broadcast the data received from thehost PC.

[1011] 3) The program is downloaded to the embedded DRAM and broadcastedto all ISISlave SoPECs over the ISI.

[1012] 4) The CPU calculates a SHA-1 hash digest of the downloadedprogram.

[1013] 5) The ResetSrc register in the CPR block is read to determinewhether or not a power-on reset occurred.

[1014] 6) If a power-on reset occurred the signature of the downloadedcode (which needs to be in a known location such as the first or last Nbytes of the downloaded code) is decrypted using the Silverbrook publicboot0key stored in ROM. This decrypted signature is the expected SHA-1hash of the accompanying program. The encryption algorithm is likely tobe a public key algorithm such as RSA. If a power-on reset did not occurthen the expected SHA-1 hash is retrieved from the PSS and the computeintensive decryption is not required.

[1015] 7) The calculated and expected hash values are compared and ifthey match then the programs authenticity has been verified.

[1016] 8) If the hash values do not match then the host PC is notifiedof the failure and the SoPEC will await a new program download.

[1017] 9) If the hash values match then the CPU starts executing thedownloaded program.

[1018] 10) It is likely that the downloaded program will poll eachISISlave SoPEC for the result of its authentication process and todetermine the number of slaves present and their ISIIds.

[1019] 11) If any ISISlave SoPEC reports a failed authentication thenthe ISIMaster communicates this to the host PC and the SoPEC will awaita new program download.

[1020] 12) If all ISISlaves report successful authentication then thedownloaded program is responsible for the downloading, authenticationand distribution of subsequent programs within the multi-SoPEC system.

[1021] 13) At some subsequent point OEM code starts executing. TheSilverbrook supervisor code acts as an O/S to the OEM user mode code.The OEM code must access most SoPEC functionality via system calls tothe Silverbrook code.

[1022] 14) The OEM code is expected to perform some simple ‘turn on thelights’ tasks after which the master SoPEC determines that all SoPECsare ready to print. The host PC is informed that the printer is ready toprint and the Start Printing use case comes into play.

[1023] 10.5.3.2 ISISlave SoPEC Process:

[1024] 1) When the CPU comes out of reset the SCB will be in slave mode,and the SCB is already configured to receive data from both the ISI andUSB.

[1025] 2) The program is downloaded (via ISI or USB) to embedded DRAM.

[1026] 3) The CPU calculates a SHA-1 hash digest of the downloadedprogram.

[1027] 4) The ResetSrc register in the CPR block is read to determinewhether or not a power-on reset occurred.

[1028] 5) If a power-on reset occurred the signature of the downloadedcode (which needs to be in a known location such as the first or last Nbytes of the downloaded code) is decrypted using the Silverbrook publicboot0key stored in ROM. This decrypted signature is the expected SHA-1hash of the accompanying program. The encryption algorithm is likely tobe a public key algorithm such as RSA. If a power-on reset did not occurthen the expected SHA-1 hash is retrieved from the PSS and the computeintensive decryption is not required.

[1029] 6) The calculated and expected hash values are compared and ifthey match then the programs authenticity has been verified.

[1030] 7) If the hash values do not match, then the ISISlave device willawait a new program again

[1031] 8) If the hash values match then the CPU starts executing thedownloaded program.

[1032] 9) It is likely that the downloaded program will communicate theresult of its authentication process to the ISIMaster. The downloadedprogram is responsible for determining the SoPECs ISIID, receiving andauthenticating any subsequent programs.

[1033] 10) At some subsequent point OEM code starts executing. TheSilverbrook supervisor code acts as an O/S to the OEM user mode code.The OEM code must access most SoPEC functionality via system calls tothe Silverbrook code.

[1034] 11) The OEM code is expected to perform some simple ‘turn on thelights’ tasks after which the master SoPEC is informed that this slaveis ready to print. The Start Printing use case then comes into play.

[1035] Known Weaknesses

[1036] If the Silverbrook private boot0key is exposed or cracked thenthe system is seriously compromised.

[1037] ISI is an open interface i.e. messages sent over the ISI are inthe clear. The communication channels are insecure but all traffic issigned to guarantee authenticity. As all communication over the ISI iscontrolled by Supervisor code on both the ISIMaster and ISISlave thenthis also provides some protection against software attacks.

[1038] 10.5.4 Authentication and Upgrade of Operating Parameters for aPrinter

[1039] The SoPEC IC will be used in a range of printers with differentcapabilities (e.g. A3/A4 printing, printing speed, resolution etc.). Itis expected that some printers will also have a software upgradecapability which would allow a user to purchase a license that enablesan upgrade in their printer's capabilities (such as print speed). Tofacilitate this it must be possible to securely store the operatingparameters in the PRINTER_QA chip, to securely communicate theseparameters to the SoPEC and to securely reprogram the parameters in theevent of an upgrade. Note that each printing SoPEC (as opposed to aSoPEC that is only used for the storage of data) will have its ownPRINTER_QA chip (or at least access to a PRINTER_QA that contains theSoPEC's SoPEC_id_key). Therefore both ISIMaster and ISISlave SoPECs willneed to authenticate operating parameters.

[1040] Process:

[1041] 1) Program code is downloaded and authenticated as described insections 10.5.2 and 10.5.3 above.

[1042] 2) The program code has a function to create the SoPEC_id_keyfrom the unique SoPEC_id that was programmed when the SoPEC wasmanufactured.

[1043] 3) The SoPEC retrieves the signed operating parameters from itsPRINTER_QA chIP. The PRINTER_QA chip uses the SoPEC_id_key (which isstored as part of the pairing process executed during printhead assemblymanufacture & test) to sign the operating parameters which are appendedwith a random number to thwart replay attacks.

[1044] 4) The SoPEC checks the signature of the operating parametersusing its SoPEC_id_-key. If this signature authentication process issuccessful then the operating parameters are considered valid and theoverall boot process continues. If not the error is reported to the hostPC.

[1045] 5) Operating parameters may also be set or upgraded using asecond key, the PrintEngineLicense_key, which is stored on thePRINTER_QA and used to authenticate the change in operating parameters.

[1046] Known Weaknesses:

[1047] It may be possible to retrieve the unique SoPEC_id by placing theSoPEC in test mode and scanning it out. It is certainly possible toobtain it by reverse engineering the device. Either way the SoPEC_id(and by extension the SoPEC_id_key) so obtained is valid only for thatspecific SoPEC and so printers may only be compromised one at a time byparties with the appropriate specialised equipment. Furthermore even ifthe SoPEC_id is compromised, the other keys in the system, which protectthe authentication of consumables and of program code, are unaffected.

[1048] 10.6 Miscellaneous Use Cases

[1049] There are many miscellaneous use cases such as the followingexamples. Software running on the SoPEC CPU or host will decide on whatactions to take in these scenarios.

[1050] 10.6.1 Disconnect/Re-Connect of QA Chips.

[1051] 1) Disconnect of a QA chip between documents or if ink runs outmid-document.

[1052] 2) Re-connect of a QA chip once authenticated e.g. ink cartridgereplacement should allow the system to resume and print the nextdocument

[1053] 10.6.2 Page Arrives Before Print Ready Interrupt.

[1054] 1) Engage clutch to stop paper until print ready interruptoccurs.

[1055] 10.6.3 Dead-Nozzle Table Upgrade

[1056] This sequence is typically performed when dead nozzle informationneeds to be updated by performing a printhead dead nozzle test.

[1057] 1) Run printhead nozzle test sequence

[1058] 2) Either host or SoPEC CPU converts dead nozzle information intodead nozzle table.

[1059] 3) Store dead nozzle table on host.

[1060] 4) Write dead nozzle table to SoPEC DRAM.

[1061] 10.7 Failure Mode Use Cases

[1062] 10.7.1 System Errors and Security Violations

[1063] System errors and security violations are reported to the SoPECCPU and host. Software running on the SoPEC CPU or host will then decidewhat actions to take.

[1064] Silverbrook code authentication failure.

[1065] 1) Notify host PC of authentication failure.

[1066] 2) Abort print run.

[1067] OEM code authentication failure.

[1068] 1) Notify host PC of authentication failure.

[1069] 2) Abort print run.

[1070] Invalid QA chip(s).

[1071] 1) Report to host PC.

[1072] 2) Abort print run.

[1073] MMU security violation interrupt.

[1074] 1) This is handled by exception handler.

[1075] 2) Report to host PC

[1076] 3) Abort print run.

[1077] Invalid address interrupt from PCU.

[1078] 1) This is handled by exception handler.

[1079] 2) Report to host PC.

[1080] 3) Abort print run.

[1081] Watchdog timer interrupt.

[1082] 1) This is handled by exception handler.

[1083] 2) Report to host PC.

[1084] 3) Abort print run.

[1085] Host PC does not acknowledge message that SoPEC is about to powerdown.

[1086] 1) Power down anyway.

[1087] 10.7.2 Printing Errors

[1088] Printing errors are reported to the SoPEC CPU and host. Softwarerunning on the host or SoPEC CPU will then decide what actions to take.

[1089] Insufficient space available in SoPEC compressed band-store todownload a band.

[1090] 1) Report to the host PC.

[1091] Insufficient ink to print.

[1092] 1) Report to host PC.

[1093] Page not downloaded in time while printing.

[1094] 1) Buffer underrun interrupt will occur.

[1095] 2) Report to host PC and abort print run.

[1096] JPEG decoder error interrupt.

[1097] 1) Report to host PC.

[1098] CPU Subsystem

[1099] 11 Central Processing Unit (CPU)

[1100] 11.1 Overview

[1101] The CPU block consists of the CPU core, MMU, cache and associatedlogic. The principal tasks for the program running on the CPU to fulfillin the system are:

[1102] Communications:

[1103] Control the flow of data from the USB interface to the DRAM andISI

[1104] Communication with the host via USB or ISI

[1105] Running the USB device driver

[1106] PEP Subsystem Control:

[1107] Page and band header processing (may possibly be performed onhost PC)

[1108] Configure printing options on a per band, per page, per job orper power cycle basis

[1109] Initiate page printing operation in the PEP subsystem

[1110] Retrieve dead nozzle information from the printhead interface(PHI) and forward to the host PC

[1111] Select the appropriate firing pulse profile from a set ofpredefined profiles based on the printhead characteristics

[1112] Retrieve printhead temperature via the PHI

[1113] Security:

[1114] Authenticate downloaded program code

[1115] Authenticate printer operating parameters

[1116] Authenticate consumables via the PRINTER_QA and INK_QA chips

[1117] Monitor ink usage

[1118] Isolation of OEM code from direct access to the system resources

[1119] Other:

[1120] Drive the printer motors using the GPIO pins

[1121] Monitoring the status of the printer (paper jam, tray empty etc.)

[1122] Driving front panel LEDs

[1123] Perform post-boot initialisation of the SoPEC device

[1124] Memory management (likely to be in conjunction with the host PC)

[1125] Miscellaneous housekeeping tasks

[1126] To control the Print Engine Pipeline the CPU is required toprovide a level of performance at least equivalent to a 16-bit HitachiH8-3664 microcontroller running at 16 MHz. An as yet undetermined amountof additional CPU performance is needed to perform the other tasks, aswell as to provide the potential for such activity as Netpage pageassembly and processing, RIPing etc. The extra performance required isdominated by the signature verification task and the SCB (including theUSB) management task. An operating system is not required at present. Anumber of CPU cores have been evaluated and the LEON P1754 is consideredto be the most appropriate solution. A diagram of the CPU block is shownin FIG. 15 below.

[1127] 11.2 Definitions of I/Os TABLE 14 CPU Subsystem I/Os Port namePins I/O Description Clocks and Resets prst_n 1 In Global reset.Synchronous to pclk, active low. Pclk 1 In Global clock CPU to DIU DRAMinterface cpu_adr[21:2] 20 Out Address bus for both DRAM and peripheralaccess cpu_dataout[31:0] 32 Out Data out to both DRAM and peripheraldevices. This should be driven at the same time as the cpu_adr andrequest signals. dram_cpu_data[255:0] 256 In Read data from the DRAMcpu_diu_rreq 1 Out Read request to the DIU DRAM diu_cpu_rack 1 InAcknowledge from DIU that read request has been accepted. diu_cpu_rvalid1 In Signal from DIU telling SoPEC Unit that valid read data is on thedram_cpu_data bus cpu_diu_wdatavalid 1 Out Signal from the CPU to theDIU indicating that the data currently on the cpu_diu_wdata bus is validand should be committed to the DIU posted write buffer diu_cpu_write_rdy1 In Signal from the DIU indicating that the posted write buffer isempty cpu_diu_wdadr[21:4] 18 Out Write address bus to the DIUcpu_diu_wdata[127:0] 128 Out Write data bus to the DIUcpu_diu_wmask[15:0] 16 Out Write mask for the cpu_diu_wdata bus. Eachbit corresponds to a byte of the 128-bit cpu_diu_wdata bus. CPU toperipheral blocks cpu_rwn 1 Out Common read/not-write signal from theCPU cpu_acode[1:0] 2 Out CPU access code signals. cpu_acode[0] - Program(0) / Data (1) access cpu_acode[1] - User (0) / Supervisor (1) accesscpu_cpr_sel 1 Out CPR block select. cpr_cpu_rdy 1 In Ready signal to theCPU. When cpr_cpu_rdy is high it indicates the last cycle of the access.For a write cycle this means cpu_dataout has been registered by the CPRblock and for a read cycle this means the data on cpr_cpu_data is valid.cpr_cpu_berr 1 In CPR bus error signal to the CPU. cpr_cpu_data[31:0] 32In Read data bus from the CPR block cpu_gpio_sel 1 Out GPIO blockselect. gpio_cpu_rdy 1 In GPIO ready signal to the CPU. gpio_cpu_berr 1In GPIO bus error signal to the CPU. gpio_cpu_data[31:0] 32 In Read databus from the GPIO block cpu_icu_sel 1 Out ICU block select. icu_cpu_rdy1 In ICU ready signal to the CPU. icu_cpu_berr 1 In ICU bus error signalto the CPU. icu_cpu_data[31:0] 32 In Read data bus from the ICU blockcpu_lss_sel 1 Out LSS block select. lss_cpu_rdy 1 In LSS ready signal tothe CPU. lss_cpu_berr 1 In LSS bus error signal to the CPU.lss_cpu_data[31:0] 32 In Read data bus from the LSS block cpu_pcu_sel 1Out PCU block select. pcu_cpu_rdy 1 In PCU ready signal to the CPU.pcu_cpu_berr 1 In PCU bus error signal to the CPU. pcu_cpu_data[31:0] 32In Read data bus from the PCU block cpu_scb_sel 1 Out SCB block select.scb_cpu_rdy 1 In SCB ready signal to the CPU. scb_cpu_berr 1 In SCB buserror signal to the CPU. scb_cpu_data[31:0] 32 In Read data bus from theSCB block cpu_tim_sel 1 Out Timers block select. tim_cpu_rdy 1 In Timersblock ready signal to the CPU. tim_cpu_berr 1 In Timers bus error signalto the CPU. tim_cpu_data[31:0] 32 In Read data bus from the Timers blockcpu_rom_sel 1 Out ROM block select. rom_cpu_rdy 1 In ROM block readysignal to the CPU. rom_cpu_berr 1 In ROM bus error signal to the CPU.rom_cpu_data[31:0] 32 In Read data bus from the ROM block cpu_pss_sel 1Out PSS block select. pss_cpu_rdy 1 In PSS block ready signal to theCPU. pss_cpu_berr 1 In PSS bus error signal to the CPU.pss_cpu_data[31:0] 32 In Read data bus from the PSS block cpu_diu_sel 1Out DIU register block select. diu_cpu_rdy 1 In DIU register block readysignal to the CPU. diu_cpu_berr 1 In DIU bus error signal to the CPU.diu_cpu_data[31:0] 32 In Read data bus from the DIU block Interruptsignals icu_cpu_ilevel[3:0] 3 In An interrupt is asserted by driving theappropriate priority level on icu_cpu_ilevel. These signals must remainasserted until the CPU executes an interrupt acknowledge cycle. 3 OutIndicates the level of the interrupt the CPU is acknowledging whencpu_iack is high cpu_iack 1 Out Interrupt acknowledge signal. The exacttiming depends on the CPU core implementation Debug signalsdiu_cpu_debug_valid 1 In Signal indicating the data on the diu_cpu_databus is valid debug data. tim_cpu_debug_valid 1 In Signal indicating thedata on the tim_cpu_data bus is valid debug data. scb_cpu_debug_valid 1In Signal indicating the data on the scb_cpu_data bus is valid debugdata. pcu_cpu_debug_valid 1 In Signal indicating the data on thepcu_cpu_data bus is valid debug data. lss_cpu_debug_valid 1 In Signalindicating the data on the lss_cpu_data bus is valid debug data.icu_cpu_debug_valid 1 In Signal indicating the data on the icu_cpu_databus is valid debug data. gpio_cpu_debug_valid 1 In Signal indicating thedata on the gpio_cpu_data bus is valid debug data. cpr_cpu_debug_valid 1In Signal indicating the data on the cpr_cpu_data bus is valid debugdata. debug_data_out 32 Out Output debug data to be muxed on to the GPIO& PHI pins debug_data_valid 1 Out Debug valid signal indicating thevalidity of the data on debug_data_out. This signal is used in all debugconfigurations debug_cntrl 33 Out Control signal for each PHI bounddebug data line indicating whether or not the debug data should beselected by the pin mux

[1128] 11.3 Realtime Requirements

[1129] The SoPEC realtime requirements have yet to be fully determinedbut they may be split into three categories: hard, firm and soft

[1130] 11.3.1 Hard Realtime Requirements

[1131] Hard requirements are tasks that must be completed before acertain deadline or failure to do so will result in an error perceptibleto the user (printing stops or functions incorrectly). There are threehard realtime tasks:

[1132] Motor control: The motors which feed the paper through theprinter at a constant speed during printing are driven directly by theSoPEC device. Four periodic signals with different phase relationshipsneed to be generated to ensure the paper travels smoothly through theprinter. The generation of these signals is handled by the GPIO hardware(see section 13.2 for more details) but the CPU is responsible forenabling these signals (i.e. to start or stop the motors) andcoordinating the movement of the paper with the printing operation ofthe printhead.

[1133] Buffer management: Data enters the SoPEC via the SCB at an unevenrate and is consumed by the PEP subsystem at a different rate. The CPUis responsible for managing the DRAM buffers to ensure that neitheroverrun nor underrun occur. This buffer management is likely to beperformed under the direction of the host.

[1134] Band processing: In certain cases PEP registers may need to beupdated between bands. As the timing requirements are most likely toostringent to be met by direct CPU writes to the PCU a more likelyscenario is that a set of shadow registers will programmed in thecompressed page units before the current band is finished, copied toband related registers by the finished band signals and the processingof the next band will continue immediately. An alternative solution isthat the CPU will construct a DRAM based set of commands (see section21.8.5 for more details) that can be executed by the PCU. The task forthe CPU here is to parse the band headers stored in DRAM and generate aDRAM based set of commands for the next number of bands. The location ofthe DRAM based set of commands must then be written to the PCU beforethe current band has been processed by the PEP subsystem. It is alsoconceivable (but currently considered unlikely) that the host PC couldcreate the DRAM based commands. In this case the CPU will only berequired to point the PCU to the correct location in DRAM to executecommands from.

[1135] 11.3.2 Firm Requirements

[1136] Firm requirements are tasks that should be completed by a certaintime or failure to do so will result in a degradation of performance butnot an error. The majority of the CPU tasks for SoPEC fall into thiscategory including all interactions with the QA chips, programauthentication, page feeding, configuring PEP registers for a page orjob, determining the firing pulse profile, communication of printerstatus to the host over the USB and the monitoring of ink usage. Theauthentication of downloaded programs and messages will be the mostcompute intensive operation the CPU will be required to perform. Initialinvestigations indicate that the LEON processor, running at 160 MHz,will easily perform three authentications in under a second. TABLE 15Expected firm requirements Requirement Duration Power-on to start ofprinting first ˜8 secs ?? page [USB and slave SoPEC enumeration, 3 ormore RSA signature verifications, code and compressed page data downloadand chip initialisation] Wake-up from sleep mode to start ˜2 secsprinting [3 or more SHA-1/RSA operations, code and compressed page datadownload and chip re- initialisation Authenticate ink usage in theprinter ˜0.5 secs Determining firing pulse profile ˜0.1 secs Pagefeeding, gap between pages OEM dependent Communication of printer status˜10 ms to host PC Configuring PEP registers ??

[1137] 11.3.3 Soft Requirements

[1138] Soft requirements are tasks that need to be done but there areonly light time constraints on when they need to be done. These tasksare performed by the CPU when there are no pending higher prioritytasks. As the SoPEC CPU is expected to be lightly loaded these taskswill mostly be executed soon after they are scheduled.

[1139] 11.4 Bus Protocols

[1140] As can be seen from FIG. 15 above there are different buses inthe CPU block and different protocols are used for each bus. There arethree buses in operation:

[1141] 11.4.1 AHB Bus

[1142] The LEON CPU core uses an AMBA2.0 AHB bus to communicate withmemory and peripherals (usually via an APB bridge). See the AMBAspecification [38], section 5 of the LEON users manual [37] and section11.6.6.1 of this document for more details.

[1143] 11.4.2 CPU to DIU Bus

[1144] This bus conforms to the DIU bus protocol described in Section20.14.8. Note that the address bus used for DIU reads (i.e.cpu_adr(21:2)) is also that used for CPU subsystem with bus accesseswhile the write address bus (cpu_diu_wadr) and the read and write databuses (dram_cpu_data and cpu_diu_wdata) are private buses between theCPU and the DIU. The effective bus width differs between a read (256bits) and a write (128 bits). As certain CPU instructions may requirebyte write access this will need to be supported by both the DRAM writebuffer (in the AHB bridge) and the DIU. See section 11.6.6.1 for moredetails.

[1145] 11.4.3 CPU Subsystem Bus

[1146] For access to the on-chip peripherals a simple bus protocol isused. The MMU must first determine which particular block is beingaddressed (and that the access is a valid one) so that the appropriateblock select signal can be generated. During a write access CPU writedata is driven out with the address and block select signals in thefirst cycle of an access. The addressed slave peripheral responds byasserting its ready signal indicating that it has registered the writedata and the access can complete. The write data bus is common to allperipherals and is also used for CPU writes to the embedded DRAM. A readaccess is initiated by driving the address and select signals during thefirst cycle of an access. The addressed slave responds by placing theread data on its bus and asserting its ready signal to indicate to theCPU that the read data is valid. Each block has a separatepoint-to-point data bus for read accesses to avoid the need for atri-stateable bus. All peripheral accesses are 32-bit (Programming note:char or short C types should not be used to access peripheralregisters). The use of the ready signal allows the accesses to be ofvariable length. In most cases accesses will complete in two cycles butthree or four (or more) cycles accesses are likely for PEP blocks or IPblocks with a different native bus interface. All PEP blocks areaccessed via the PCU which acts as a bridge. The PCU bus uses a similarprotocol to the CPU subsystem bus but with the PCU as the bus master.

[1147] The duration of accesses to the PEP blocks is influenced bywhether or not the PCU is executing commands from DRAM. As thesecommands are essentially register writes the CPU access will need towait until the PCU bus becomes available when a register access has beencompleted. This could lead to the CPU being stalled for up to 4 cyclesif it attempts to access PEP blocks while the PCU is executing acommand. The size and probability of this penalty is sufficiently smallto have any significant impact on performance.

[1148] In order to support user mode (i.e. OEM code) access to certainperipherals the CPU subsystem bus propagates the CPU function codesignals (cpu_acode[1:0]). These signals indicate the type of addressspace (i.e. User/Supervisor and Program/Data) being accessed by the CPUfor each access. Each peripheral must determine whether or not the CPUis in the correct mode to be granted access to its registers and in somecases (e.g. Timers and GPIO blocks) different access permissions canapply to different registers within the block. If the CPU is not in thecorrect mode then the violation is flagged by asserting the block's buserror signal (block_cpu_berr) with the same timing as its ready signal(block_cpu_rdy) which remains deasserted. When this occurs invalid readaccesses should return 0 and write accesses should have no effect.

[1149]FIG. 16 shows two examples of the peripheral bus protocol inaction. A write to the LSS block from code running in supervisor mode issuccessfully completed. This is immediately followed by a read from aPEP block via the PCU from code running in user mode. As this type ofaccess is not permitted the access is terminated with a bus error. Thebus error exception processing then starts directly after this—nofurther accesses to the peripheral should be required as the exceptionhandler should be located in the DRAM.

[1150] Each peripheral acts as a slave on the CPU subsystem bus and itsbehavior is described by the state machine in section 11.4.3.1

[1151] 11.4.3.1 CPU Subsystem Bus Slave State Machine

[1152] CPU subsystem bus slave operation is described by the statemachine in FIG. 17. This state machine will be implemented in each CPUsubsystem bus slave. The only new signals mentioned here are thevalid_access and reg_available signals. The valid_access is determinedby comparing the cpu_acode value with the block or register (in the caseof a block that allow user access on a per register basis such as theGPIO block) access permissions and asserting valid_access if thepermissions agree with the CPU mode. The reg_available signal is onlyrequired in the PCU or in blocks that are not capable of two-cycleaccess (e.g. blocks containing imported IP with different busprotocols). In these blocks the reg_available signal is an internalsignal used to insert wait states (by delaying the assertion ofblock_cpu_rdy) until the CPU bus slave interface can gain access to theregister.

[1153] When reading from a register that is less than 32 bits wide theCPU subsystems bus slave should return zeroes on the unused upper bitsof the block_cpu_data bus.

[1154] To support debug mode the contents of the register selected fordebug observation, debug_reg, are always output on the block_cpu_databus whenever a read access is not taking place. See section

[1155] 11.8 for More Details of Debug Operation.

[1156] 11.5 LEON CPU

[1157] The LEON processor is an open-source implementation of theIEEE-1754 standard (SPARC V8) instruction set. LEON is available fromand actively supported by Gaisler Research (www.gaisler.com).

[1158] The following features of the LEON-2 processor will be utilisedon SoPEC:

[1159] IEEE-1754 (SPARC V8) compatible integer unit with 5-stagepipeline

[1160] Separate instruction and data cache (Harvard architecture). 1kbyte direct mapped caches will be used for both.

[1161] Full implementation of AMBA-2.0 AHB on-chip bus

[1162] The standard release of LEON incorporates a number of peripheralsand support blocks which will not be included on SoPEC. The LEON core asused on SoPEC will consist of: 1) the LEON integer unit, 2) theinstruction and data caches (currently 1 kB each), 3) the cache controllogic, 4) the AHB interface and 5) possibly the AHB controller (althoughthis functionality may be implemented in the LEON AHB bridge).

[1163] The version of the LEON database that the SoPEC LEON componentswill be sourced from is LEON2-1.0.7 although later versions may be usedif they offer worthwhile functionality or bug fixes that affect theSoPEC design.

[1164] The LEON core will be clocked using the system clock, pclk, andreset using the prst_n_section[1] signal. The ICU will assert all thehardware interrupts using the protocol described in section 11.9. TheLEON hardware multipliers and floating-point unit are not required.SoPEC will use the recommended 8 register window configuration.

[1165] Further details of the SPARC V8 instruction set and the LEONprocessor can be found in [36] and [37] respectively.

[1166] 11.5.1 LEON Registers

[1167] Only two of the registers described in the LEON manual areimplemented on SoPEC—the LEON configuration register and the CacheControl Register (CCR). The addresses of these registers are shown inTable 16. The configuration register bit fields are described below andthe CCR is described in section 11.7.1.1.

[1168] 11.5.1.1 LEON Configuration Register

[1169] The LEON configuration register allows runtime software todetermine the settings of LEONs various configuration options. This is aread-only register whose value for the SoPEC ASIC will be0x1071_(—)8C00. Further descriptions of many of the bitfileds can befound in the LEON manual. The values used for SoPEC are highlighted inbold for clarity. TABLE 16 LEON Configuration Register Field Name bit(s)Description WriteProtection 1:0 Write protection type. 00 - none 01 -standard PCICore 3:2 PCI core type 00 - none 01 - InSilicon 10 - ESA11 - Other FPUType 5:4 FPU type. 00 - none 01 - Meiko MemStatus 6 0 - Nomemory status and failing address register present 1 - Memory status andfailing address register present Watchdog 7 0 - Watchdog timer notpresent (Note this refers to the LEON watchdog timer in the LEON timerblock). 1 - Watchdog timer present UMUL/SMUL 8 0 - UMUL/SMULinstructions are not implemented 1 - UMUL/SMUL instructions areimplemented UDIV/SDIV 9 0 - UMUL/SMUL instructions are not implemented1 - UMUL/SMUL instructions are implemented DLSZ 11:10 Data cache linesize in 32-bit words: 00 - 1 word 01 - 2 words 10 - 4 words 11 - 8 wordsDCSZ 14:12 Data cache size in kBbytes = 2^(DCSZ). SoPEC DCSZ = 0. ILSZ16:15 Instruction cache line size in 32-bit words: 00 - 1 word 01 - 2words 10 - 4 words 11 - 8 words ICSZ 19:17 Instruction cache size inkBbytes = 2^(ICSZ). SoPEC ICSZ = 0. RegWin 24:20 The implemented numberof SPARC register windows - 1. SoPEC value = 7. UMAC/SMAC 25 0 -UMAC/SMAC instructions are not implemented 1 - UMAC/SMAC instructionsare implemented Watchpoints 28:26 The implemented number of hardwarewatchpoints. SoPEC value = 4. SDRAM 29 0 - SDRAM controller not present1 - SDRAM controller present DSU 30 0 - Debug Support Unit not present1 - Debug Support Unit present Reserved 31 Reserved. SoPEC value = 0.

[1170] 11.6 Memory Management Unit (MMU)

[1171] Memory Management Units are typically used to protect certainregions of memory from invalid accesses, to perform address translationfor a virtual memory system and to maintain memory page status(swapped-in, swapped-out or unmapped)

[1172] The SoPEC MMU is a much simpler affair whose function is toensure that all regions of the SoPEC memory map are adequatelyprotected. The MMU does not support virtual memory and physicaladdresses are used at all times. The SoPEC MMU supports a full 32-bitaddress space. The SoPEC memory map is depicted in FIG. 18 below.

[1173] The MMU selects the relevant bus protocol and generates theappropriate control signals depending on the area of memory beingaccessed. The MMU is responsible for performing the address decode andgeneration of the appropriate block select signal as well as theselection of the correct block read bus during a read access. The MMUwill need to support all of the bus transactions the CPU can produceincluding interrupt acknowledge cycles, aborted transactions etc. Whenan MMU error occurs (such as an attempt to access a supervisor mode onlyregion when in user mode) a bus error is generated. While the LEON canrecognise different types of bus error (e.g. data store error,instruction access error) it handles them in the same manner as ithandles all traps i.e it will transfer control to a trap handler. Noextra state information is be stored because of the nature of the trap.The location of the trap handler is contained in the TBR (Trap BaseRegister). This is the same mechanism as is used to handle interrupts.

[1174] 11.6.1 CPU-Bus Peripherals Address Map

[1175] The address mapping for the peripherals attached to the CPU-busis shown in Table 17 below. The MMU performs the decode of the highorder bits to generate the relevant cpu_block_select signal. Apart fromthe PCU, which decodes the address space for the PEP blocks, each blockonly needs to decode as many bits of cpu_adr[1 1:2] as required toaddress all the registers within the block. TABLE 17 CPU-bus peripheralsaddress map Block_base Address ROM_base 0x0000_0000 MMU_base 0x0001_0000TIM_base 0x0001_1000 LSS_base 0x0001_2000 GPIO_base 0x0001_3000 SCB_base0x0001_4000 ICU_base 0x0001_5000 CPR_base 0x0001_6000 DIU_base0x0001_7000 PSS_base 0x0001_8000 Reserved 0x0001_9000 to 0x0001_FFFFPCU_base 0x0002_0000

[1176] 11.6.2 DRAM Region Mapping

[1177] The embedded DRAM is broken into 8 regions, with each regiondefined by a lower and upper bound address and with its own accesspermissions.

[1178] The association of an area in the DRAM address space with a MMUregion is completely under software control. Table 18 below gives onepossible region mapping. Regions should be defined according to theiraccess requirements and position in memory. Regions that share the sameaccess requirements and that are contiguous in memory may be combinedinto a single region. The example below is purely for indicativepurposes—real mappings are likely to differ significantly from this.Note that the RegionBottom and RegionTop fields in this example includethe DRAM base address offset (0x4000_(—)0000) which is not required whenprogramming the RegionNTop and RegionNBottom registers. For moredetails, see 11.6.5.1 and 11.6.5.2. TABLE 18 Example region mappingRegion RegionBottom RegionTop Description 0 0x4000_0000 0x4000_0FFFSilverbrook OS (supervisor) data 1 0x4000_1000 0x4000_BFFF SilverbrookOS (supervisor) code 2 0x4000_C000 0x4000_C3FF Silverbrook(supervisor/user) data 3 0x4000_C400 0x4000_CFFF Silverbrook(supervisor/user) code 4 0x4026_D000 0x4026_D3FF OEM (user) data 50x4026_D400 0x4026_DFFF OEM (user) code 6 0x4027_E000 0x4027_FFFF SharedSilverbrook/ OEM space 7 0x4000_D000 0x4026_CFFF Compressed page store(supervisor data)

[1179] 11.6.3 Non-DRAM Regions

[1180] As shown in FIG. 18 the DRAM occupies only 2.5 MBytes of thetotal 4 GB SoPEC address space. The non-DRAM regions of SoPEC arehandled by the MMU as follows:

[1181] ROM (1x0000_(—)000 to 1x0000_FFFF): The ROM block will controlthe access types allowed. The cpu_acode[1:0] signals will indicate theCPU mode and access type and the ROM block will assert rom_cpu_berr ifan attempted access is forbidden. The protocol is described in moredetail in section 11.4.3. The ROM block access permissions are hardwired to allow all read accesses except to the FuseChipID registerswhich may only be read in supervisor mode.

[1182] MMU Internal Registers (0x0001_(—)0000 to 1x0001_(—)0FFF): TheMMU is responsible for controlling the accesses to its own internalregisters and will only allow data reads and writes (no instructionfetches) from supervisor data space. All other accesses will result inthe mmu_cpu_berr signal being asserted in accordance with the CPU nativebus protocol.

[1183] CPU Subsystem Peripheral Registers (1x0001_(—)1000 to1x0001_FFFF): Each peripheral block will control the access typesallowed. Every peripheral will allow supervisor data accesses (both readand write) and some blocks (e.g. Timers and GPIO) will also allow userdata space accesses as outlined in the relevant chapters of thisspecification. Neither supervisor nor user instruction fetch accessesare allowed to any block as it is not possible to execute code fromperipheral registers. The bus protocol is described in section 11.4.3.

[1184] PCU Mapped Registers (0x0002_(—)0000 to 0x0002_BFFF): All of thePEP blocks registers which are accessed by the CPU via the PCU willinherit the access permissions of the PCU. These access permissions arehard wired to allow supervisor data accesses only and the protocol usedis the same as for the CPU peripherals.

[1185] Unused address space (0x0002_C000 to 0x3FFF_FFFF and0x4028_(—)0000 to 1xFFFF_FFFF): All accesses to the unused portion ofthe address space will result in the mmu_cpu_berr signal being assertedin accordance with the CPU native bus protocol. These accesses will notpropagate outside of the MMU i.e. no external access will be initiated.

[1186] 11.6.4 Reset Exception Vector and Reference Zero Traps

[1187] When a reset occurs the LEON processor starts executing code fromaddress 0x0000_(—)0000. A common software bug is zero-referencing ornull pointer de-referencing (where the program attempts to access thecontents of address 0x0000_(—)0000). To assist software debug the MMUwill assert a bus error every time the locations 0x0000_(—)0000 to0x0000_(—)000F (i.e. the first 4 words of the reset trap) are accessedafter the reset trap handler has legitimately been retrieved immediatelyafter reset.

[1188] 11.6.5 MMU Configuration Registers

[1189] The MMU configuration registers include the RDU configurationregisters and two LEON registers. Note that all the MMU configurationregisters may only be accessed when the CPU is running in supervisormode. TABLE 19 MMU Configuration Registers Address offset from MMU_baseRegister #bits Reset Description 0x00 Region0Bottom[21:5] 17 0x0_0000This register contains the physical address that marks the bottom ofregion 0 0x04 Region0Top[21:5] 17 0xF_FFFF This register contains thephysical address that marks the top of region 0. Region 0 covers theentire address space after reset whereas all other regions arezero-sized initially. 0x08 Region1Bottom[21:5] 17 0xF_FFFF This registercontains the physical address that marks the bottom of region 1 0x0CRegion1Top[21:5] 17 0x0_0000 This register contains the physical addressthat marks the top of region 1 0x10 Region2Bottom[21:5] 17 0xF_FFFF Thisregister contains the physical address that marks the bottom of region 20x14 Region3Top[21:5] 17 0x0_0000 This register contains the physicaladdress that marks the top of region 2 0x18 Region3Bottom[21:5] 170xF_FFFF This register contains the physical address that marks thebottom of region 3 0x1C Region3Top[21:5] 17 0x0_0000 This registercontains the physical address that marks the top of region 3 0x20Region4Bottom[21:5] 17 0xF_FFFF This register contains the physicaladdress that marks the bottom of region 4 0x24 Region4Top[21:5] 170x0_0000 This register contains the physical address that marks the topof region 4 0x28 Region5Bottom[21:5] 17 0xF_FFFF This register containsthe physical address that marks the bottom of region 5 0x2CRegion5Top[21:5] 17 0x0_0000 This register contains the physical addressthat marks the top of region 5 0x30 Region6Bottom[21:5] 17 0xF_FFFF Thisregister contains the physical address that marks the bottom of region 60x34 Region6Top[21:5] 17 0x0_0000 This register contains the physicaladdress that marks the top of region 6 0x38 Region7Bottom[21:5] 170xF_FFFF This register contains the physical address that marks thebottom of region 7 0x3C Region7Top[21:5] 17 0x0_0000 This registercontains the physical address that marks the top of region 7 0x40Region0Control 6 0x07 Control register for region 0 0x44 Region1Control6 0x07 Control register for region 1 0x48 Region2Control 6 0x07 Controlregister for region 2 0x4C Region3Control 6 0x07 Control register forregion 3 0x50 Region4Control 6 0x07 Control register for region 4 0x54Region5Control 6 0x07 Control register for region 5 0x58 Region6Control6 0x07 Control register for region 6 0x5C Region7Control 6 0x07 Controlregister for region 7 0x60 RegionLock 8 0x00 Writing a 1 to a bit in theRegionLock register locks the value of the corresponding RegionTop,RegionBottom and RegionControl regis- ters. The lock can only be clearedby a reset and any attempt to write to a locked register will result ina bus error. 0x64 BusTimeout 8 0xFF This register should be set to thenumber of pclk cycles to wait after an access has started beforeaborting the access with a bus error. Writing 0 to this registerdisables the bus time- out feature. 0x68 ExceptionSource 6 0x00 Thisregister identifies the source of the last exception. See Section11.6.5.3 for details. 0x6C DebugSelect 7 0x00 Contains address of theregister selected for debug observation. It is expected that a number ofpseudo-registers will be made available for debug observation and thesewill be outlined during the implementation phase. 0x80 to RDU RegistersSee Table for details. 0x108 0x140 LEON Configuration 32 0x1071_(—) TheLEON configuration register is used by Register 8C00 software todetermine the configuration of this LEON implementation. See section11.5.1.1 for details. This register is ReadOnly. 0x144 LEON Cache 320x0000_(—) The LEON Cache Control Register is used to Control Register0000 control the operation of the caches. See section 11.6 for details.

[1190] 11.6.5.1 RegionTop and RegionBottom Registers

[1191] The 20 Mbit of embedded DRAM on SoPEC is arranged as 81920 wordsof 256 bits each. All region boundaries need to align with a 256-bitword. Thus only 17 bits are required for the RegionNTop andRegionNBottom registers. Note that the bottom 5 bits of the RegionNTopand RegionNBottom registers cannot be written to and read as ‘0’ i.e.the RegionNTop and RegionNBottom registers represent byte-aligned DRAMaddresses

[1192] Both the RegionNTop and RegionNBottom registers are inclusivei.e. the addresses in the registers are included in the region. Thus thesize of a region is (RegionNTop—RegionNBottom)+1 DRAM words.

[1193] If DRAM regions overlap (there is no reason for this to be thecase but there is nothing to prohibit it either) then only accessesallowed by all overlapping regions are permitted. That is if a DRAMaddress appears in both Region1 and Region3 (for example) the cpu_acodeof an access is checked against the access permissions of both regions.If both regions permit the access then it will proceed but if either orboth regions do not permit the access then it will not be allowed. TheMMU does not support negatively sized regions i.e. the value of theRegionNTop register should always be greater than or equal to the valueof the RegionNBottom register. If RegionNTop is lower in the address mapthan RegionNTop then the region is considered to be zero-sized and isignored.

[1194] When both the RegionNTop and RegionNBottom registers for a regioncontain the same value the region is then simply one 256-bit word inlength and this corresponds to the smallest possible active region.

[1195] 11.6.5.2 Region Control Registers

[1196] Each memory region has a control register associated with it. TheRegionNControl register is used to set the access conditions for thememory region bounded by the RegionNTop and RegionNBottom registers.Table 20 describes the function of each bit field in the RegionNControlregisters. All bits in a RegionNControl register are both readable andwritable by design. However, like all registers in the MMU, theRegionNControl registers can only be accessed by code running insupervisor mode. TABLE 20 Region Control Register Field Name bit(s)Description SupervisorAccess 2:0 Denotes the type of access allowed whenthe CPU is running in Supervisor mode. For each access type a 1indicates the access is per- mitted and a 0 indicates the access is notpermitted. bit0 - Data read access permission bit1 - Data write accesspermission bit2 - Instruction fetch access permission UserAccess 5:3Denotes the type of access allowed when the CPU is running in User mode.For each access type a 1 indicates the access is permitted and a 0indicates the access is not permitted. bit3 - Data read accesspermission bit4 - Data write access permission bit5 - Instruction fetchaccess permission

[1197] 11.6.5.3 ExceptionSource Register

[1198] The SPARC V8 architecture allows for a number of types of memoryaccess error to be trapped. These trap types and trap handling ingeneral are described in chapter 7 of the SPARC architecture manual[36]. However on the LEON processor only data_store_error anddata_access_exception trap types will result from an external (to LEON)bus error. According to the SPARC architecture manual the processor willautomatically move to the next register window (i.e. it decrements thecurrent window pointer) and copies the program counters (PC and nPC) totwo local registers in the new window. The supervisor bit in the PSR isalso set and the PSR can be saved to another local register by the traphandler (this does not happen automatically in hardware). TheExceptionSource register aids the trap handler by identifying the sourceof an exception. Each bit in the ExceptionSource register is set whenthe relevant trap condition and should be cleared by the trap handler bywriting a ‘1’ to that bit position. TABLE 21 ExceptionSource RegisterField Name bit(s) Description DramAccessExcptn 0 The permissions of anaccess did not match those of the DRAM region it was attempting toaccess. This bit will also be set if an attempt is made to access anunde- fined DRAM region (i.e. a loca- tion that is not within the boundsof any RegionTop/RegionBottom pair) PeriAccessExcptn 1 An accessviolation occurred when accessing a CPU subsystem block. This occurswhen the access per- missions disagree with those set by the block.UnusedAreaExcptn 2 An attempt was made to access an unused part of thememory map LockedWriteExcptn 3 An attempt was made to write to a regionsregisters (RegionTop/ Bottom/Control) after they had been locked.ResetHandlerExcptn 4 An attempt was made to access a ROM locationbetween 0x0000_0000 and 0x0000_000F after the reset handler wasexecuted. The most likely cause of such an access is the use of anuninitialised pointer or structure. TimeoutExcptn 5 A bus timeoutcondition occurred.

[1199] 11.6.6 MMU Sub-Block Partition

[1200] As can be seen from FIG. 19 and FIG. 20 the MMU consists of threeprincipal sub-blocks. For clarity the connections between thesesub-blocks and other SoPEC blocks and between each of the sub-blocks areshown in two separate diagrams.

[1201] 11.6.6.1 LEON AHB Bridge

[1202] The LEON AHB bridge consists of an AHB bridge to DIU and an AHBto CPU subsystem bus bridge. The AHB bridge will convert between the AHBand the DIU and CPU subsystem bus protocols but the address decoding andenabling of an access happens elsewhere in the MMU. The AHB bridge willalways be a slave on the AHB. Note that the AMBA signals from the LEONcore are contained within the ahbso and ahbsi records. The LEON recordsare described in more detail in section 11.7. Glue logic may be requiredto assist with enabling memory accesses, endianness coherency,interrupts and other miscellaneous signalling. TABLE 22 LEON AHB bridgeI/Os Port name Pins I/O Description Global SoPEC signals prst_n 1 InGlobal reset. Synchronous to pclk, active low. pclk 1 In Global clockLEON core to LEON AHB signals (ahbsi and ahbso records)ahbsi.haddr[31:0] 32 In AHB address bus ahbsi.hwdata[31:0] 32 In AHBwrite data bus ahbso.hrdata[31:0] 32 Out AHB read data bus ahbsi.hsel 1In AHB slave select signal ahbsi.hwrite 1 In AHB write signal: 1 - Writeaccess 0 - Read access ahbsi.htrans 2 In Indicates the type of thecurrent transfer: 00 - IDLE 01 - BUSY 10 - NONSEQ 11 - SEQ ahbsi.hsize 3In Indicates the size of the current transfer: 000 - Byte transfer 001 -Halfword transfer 010 - Word transfer 011 - 64-bit transfer(unsupported?) 1xx - Unsupported larger wordsizes ahbsi.hburst 3 InIndicates if the current transfer forms part of a burst and the type ofburst: 000 - SINGLE 001 - INCR 010 - WRAP4 011 - INCR4 100 - WRAP8 101 -INCR8 110 - WRAP16 111 - INCR16 ahbsi.hprot 4 In Protection controlsignals pertaining to the current access: hprot[0] - Opcode(0)/Data(1)access hprot[1] - User(0)/Supervisor access hprot[2] -Non-bufferable(0)/Bufferable(1) access (unsupported) hprot[3] -Non-cacheable(0)/Cacheable access ahbsi.hmaster 4 In Indicates theidentity of the current bus master. This will always be the LEON core.ahbsi.hmastlock 1 In Indicates that the current master is performing alocked sequence of transfers. ahbso.hready 1 Out Active high readysignal indicating the access has completed ahbso.hresp 2 Out Indicatesthe status of the transfer: 00 - OKAY 01 - ERROR 10 - RETRY 11 - SPLITahbso.hsplit[15:0] 16 Out This 16-bit split bus is used by a slave toIndicate to the arbiter which bus masters should be allowed attempt asplit transaction. This feature will be unsupported on the AHB bridgeToplevel/Common LEON AHB bridge signals cpu_dataout[31:0] 32 Out Dataout bus to both DRAM and peripheral devices. cpu_rwn 1 Out Read/NotWritesignal. 1 = Current access is a read access, 0 = Current access is awrite access icu_cpu_ilevel[3:0] 4 In An interrupt is asserted bydriving the appropriate priority level on icu_cpu_ilevel. These signalsmust remain asserted until the CPU executes an interrupt acknowledgecycle. cpu_icu_ilevel[3:0] 4 In Indicates the level of the interrupt theCPU is acknowledging when cpu_iack is high cpu_iack 1 Out Interruptacknowledge signal. The exact timing depends on the CPU coreimplementation cpu_start_access 1 Out Start Access signal indicating thestart of a data transfer and that the cpu_adr, cpu_dataout, cpu_rwn andcpu_acode signals are all valid. This signal is only asserted during thefirst cycle of an access. cpu_ben[1:0] 2 Out Byte enable signals.dram_cpu_data[255:0] 256 In Read data from the DRAM. diu_cpu_rreq 1 OutRead request to the DIU. diu_cpu_rack 1 In Acknowledge from DIU thatread request has been accepted. diu_cpu_rvalid 1 In Signal from DIUindicating that valid read data is on the dram_cpu_data buscpu_diu_wdatavalid 1 Out Signal from the CPU to the DIU indicating thatthe data currently on the cpu_diu_wdata bus is valid and should becommitted to the DIU posted write buffer diu_cpu_write_rdy 1 In Signalfrom the DIU indicating that the posted write buffer is emptycpu_diu_wdadr[21:4] 18 Out Write address bus to the DIUcpu_diu_wdata[127:0] 128 Out Write data bus to the DIUcpu_diu_wmask[15:0] 16 Out Write mask for the cpu_diu_wdata bus. Eachbit corresponds to a byte of the 128-bit cpu_diu_wdata bus. LEON AHBbridge to MMU Control Block signals cpu_mmu_adr 32 Out CPU Address Bus.mmu_cpu_data 32 In Data bus from the MMU mmu_cpu_rdy 1 In Ready signalfrom the MMU cpu_mmu_acode 2 Out Access code signals to the MMUmmu_cpu_berr 1 In Bus error signal from the MMU dram_access_en 1 In DRAMaccess enable signal. A DRAM access cannot be initiated unless it hasbeen enabled by the MMU control unit.

[1203] Description:

[1204] The LEON AHB bridge must ensure that all CPU bus transactions arefunctionally correct and that the timing requirements are met. The AHBbridge also implements a 128-bit DRAM write buffer to improve theefficiency of DRAM writes, particularly for multiple successive writesto DRAM. The AHB bridge is also responsible for ensuring endiannesscoherency i.e. guaranteeing that the correct data appears in the correctposition on the data buses (hrdata, cpu_dataout and cpu_mmu_wdata) forevery type of access. This is a requirement because the LEON usesbig-endian addressing while the rest of SoPEC is little-endian.

[1205] The LEON AHB bridge will assert request signals to the DIU if theMMU control block deems the access to be a legal access. The validity(i.e. is the CPU running in the correct mode for the address space beingaccessed) of an access is determined by the contents of the relevantRegionNControl register. As the SPARC standard requires that allaccesses are aligned to their word size (i.e. byte, half-word, word ordouble-word) and so it is not possible for an access to traverse a256-bit boundary (as required by the DIU). Invalid DRAM accesses are notpropagated to the DIU and will result in an error response(ahbso.hresp=‘01’) on the AHB. The DIU bus protocol is described in moredetail in section 20.9. The DIU will return a 256-bit dataword ondram_cpu_data[255:0] for every read access.

[1206] The CPU subsystem bus protocol is described in section 11.4.3.While the LEON AHB bridge performs the protocol translation between AHBand the CPU subsystem bus the select signals for each block aregenerated by address decoding in the CPU subsystem bus interface. TheCPU subsystem bus interface also selects the correct read data bus,ready and error signals for the block being addressed and passes theseto the LEON AHB bridge which puts them on the AHB bus. It is expectedthat some signals (especially those external to the CPU block) will needto be registered here to meet the timing requirements. Careful thoughtwill be required to ensure that overall CPU access times are notexcessively degraded by the use of too many register stages.

[1207] 11.6.6.1.1 DRAM Write Buffer

[1208] The DRAM write buffer improves the efficiency of DRAM writes byaggregating a number of CPU write accesses into a single DIU writeaccess. This is achieved by checking to see if a CPU write is to anaddress already in the write buffer and if so the write is immediatelyacknowledged (i.e. the ahbsi.hready signal is asserted without any waitstates) and the DRAM write buffer updated accordingly. When the CPUwrite is to a DRAM address other than that in the write buffer then thecurrent contents of the write buffer are sent to the DIU (where they areplaced in the posted write buffer) and the DRAM write buffer is updatedwith the address and data of the CPU write. The DRAM write bufferconsists of a 128-bit data buffer, an 18-bit write address tag and a16-bit write mask. Each bit of the write mask indicates the validity ofthe corresponding byte of the write buffer as shown in FIG. 21 below.

[1209] The operation of the DRAM write buffer is summarised by thefollowing set of rules:

[1210] 1) The DRAM write buffer only contains DRAM write data i.e.peripheral writes go directly to the addressed peripheral.

[1211] 2) CPU writes to locations within the DRAM write buffer or to anempty write buffer (i.e. the write mask bits are all 0) complete withzero wait states regardless of the size of the write(byte/half-word/word/ double-word).

[1212] 3) The contents of the DRAM write buffer are flushed to DRAMwhenever a CPU write to a location outside the write buffer occurs,whenever a CPU read from a location within the write buffer occurs orwhenever a write to a peripheral register occurs.

[1213] 4) A flush resulting from a peripheral write will not cause anyextra wait states to be inserted in the peripheral write access.

[1214] 5) Flushes resulting from a DRAM accesses will cause wait statesto be inserted until the DIU posted write buffer is empty. If the DIUposted write buffer is empty at the time the flush is required then nowait states will be inserted for a flush resulting from a CPU write orone wait state will be inserted for a flush resulting from a CPU read(this is to ensure that the DIU sees the write request ahead of the readrequest). Note that in this case further wait states will also beinserted as a result of the delay in servicing the read request by theDIU.

[1215] 11.6.6.1.2 DIU Interface Waveforms

[1216]FIG. 22 below depicts the operation of the AHB bridge over asample sequence of DRAM transactions consisting of a read into theDCache, a double-word store to an address other than that currently inthe DRAM write buffer followed by an ICache line refill. To avoidclutter a number of AHB control signals that are inputs to the MMU havebeen grouped together as ahbsi.CONTROL and only the ahbso.HREADY isshown of the output AHB control signals.

[1217] The first transaction is a single word load (‘LD’). The MMU(specifically the MMU control block) uses the first cycle of everyaccess (i.e. the address phase of an AHB transaction) to determinewhether or not the access is a legal access. The read request to the DIUis then asserted in the following cycle (assuming the access is a validone) and is acknowledged by the DIU a cycle later. Note that the timefrom cpu_diu_rreq being asserted and diu_cpu_rack being asserted isvariable as it depends on the DIU configuration and access patterns ofDIU requestors. The AHB bridge will insert wait states until it sees thediu_cpu_rvalid signal is high, indicating the data (‘LD1’) on thedram_cpu_data bus is valid. The AHB bridge terminates the read access inthe same cycle by asserting the ahbso.HREADY signal (together with an‘OKAY’ HRESP code). The AHB bridge also selects the appropriate 32 bits(‘RD1’) from the 256-bit DRAM line data (‘LD1’) returned by the DIUcorresponding to the word address given by A1.

[1218] The second transaction is an AHB two-beat incrementing burstissued by the LEON acache block in response to the execution of adouble-word store instruction. As LEON is a big endian processor theaddress issued (‘A2’) during the address phase of the first beat of thistransaction is the address of the most significant word of thedouble-word while the address for the second beat (‘A3’) is that of theleast significant word i.e. A3=A2+4. The presence of the DRAM writebuffer allows these writes to complete without the insertion of any waitstates. This is true even when, as shown here, the DRAM write bufferneeds to be flushed into the DIU posted write buffer, provided the DIUposted write buffer is empty. If the DIU posted write buffer is notempty (as would be signified by diu_cpu_write_rdy being low) then waitstates would be inserted until it became empty. The cpu_diu_wdata bufferbuilds up the data to be written to the DIU over a number oftransactions (‘BD1’ and ‘BD2’ here) while the cpu_dui_wmask recordsevery byte that has been written to since the last flush—in this casethe lowest word and then the second lowest word are written to as aresult of the double-word store operation.

[1219] The final transaction shown here is a DRAM read caused by anICache miss. Note that the pipelined nature of the AHB bus allows theaddress phase of this transaction to overlap with the final data phaseof the previous transaction. All ICache misses appear as single wordloads (‘LD’) on the AHB bus. In this case we can see that the DIU isslower to respond to this read request than to the first read requestbecause it is processing the write access caused by the DRAM writebuffer flush. The ICache refill will complete just after the windowshown in FIG. 22.

[1220] 11.6.6.2 CPU Subsystem Bus Interface

[1221] The CPU Subsystem Interface block handles all valid accesses tothe peripheral blocks that comprise the CPU Subsystem. TABLE 23 CPUSubsystem Bus Interface I/Os Port name Pins I/O Description Global SoPECsignals prst_n 1 In Global reset. Synchronous to pclk, active low. pclk1 In Global clock Toplevel/Common CPU Subsystem Bus Interface signalscpu_cpr_sel 1 Out CPR block select. cpu_gpio_sel 1 Out GPIO blockselect. cpu_icu_sel 1 Out ICU block select. cpu_lss_sel 1 Out LSS blockselect. cpu_pcu_sel 1 Out PCU block select. cpu_scb_sel 1 Out SCB blockselect. cpu_tim_sel 1 Out Timers block select. cpu_rom_sel 1 Out ROMblock select. cpu_pss_sel 1 Out PSS block select. cpu_diu_sel 1 Out DIUblock select. cpr_cpu_data[31:0] 32 In Read data bus from the CPR blockgpio_cpu_data[31:0] 32 In Read data bus from the GPIO blockicu_cpu_data[31:0] 32 In Read data bus from the ICU blocklss_cpu_data[31:0] 32 In Read data bus from the LSS blockpcu_cpu_data[31:0] 32 In Read data bus from the PCU blockscb_cpu_data[31:0] 32 In Read data bus from the SCB blocktim_cpu_data[31:0] 32 In Read data bus from the Timers blockrom_cpu_data[31:0] 32 In Read data bus from the ROM blockpss_cpu_data[31:0] 32 In Read data bus from the PSS blockdiu_cpu_data[31:0] 32 In Read data bus from the DIU block cpr_cpu_rdy 1In Ready signal to the CPU. When cpr_cpu_rdy is high it indicates thelast cycle of the access. For a write cycle this means cpu_dataout hasbeen registered by the CPR block and for a read cycle this means thedata on cpr_cpu_data is valid. gpio_cpu_rdy 1 In GPIO ready signal tothe CPU. icu_cpu_rdy 1 In ICU ready signal to the CPU. lss_cpu_rdy 1 InLSS ready signal to the CPU. pcu_cpu_rdy 1 In PCU ready signal to theCPU. scb_cpu_rdy 1 In SCB ready signal to the CPU. tim_cpu_rdy 1 InTimers block ready signal to the CPU. rom_cpu_rdy 1 In ROM block readysignal to the CPU. pss_cpu_rdy 1 In PSS block ready signal to the CPU.diu_cpu_rdy 1 In DIU register block ready signal to the CPU.cpr_cpu_berr 1 In Bus Error signal from the CPR block gpio_cpu_berr 1 InBus Error signal from the GPIO block icu_cpu_berr 1 In Bus Error signalfrom the ICU block lss_cpu_berr 1 In Bus Error signal from the LSS blockpcu_cpu_berr 1 In Bus Error signal from the PCU block scb_cpu_berr 1 InBus Error signal from the SCB block tim_cpu_berr 1 In Bus Error signalfrom the Timers block rom_cpu_berr 1 In Bus Error signal from the ROMblock pss_cpu_berr 1 In Bus Error signal from the PSS block diu_cpu_berr1 In Bus Error signal from the DIU block CPU Subsystem Bus Interface toMMU Control Block signals cpu_adr[19:12] 8 In Toplevel CPU Address bus.Only bits 19-12 are required to decode the peripherals address spaceperi_access_en 1 In Enable Access signal. A peripheral access cannot beinitiated unless it has been enabled by the MMU Control Unitperi_mmu_data[31:0] 32 Out Data bus from the selected peripheralperi_mmu_rdy 1 Out Data Ready signal. Indicates the data on theperi_mmu_data bus is valid for a read cycle or that the data wassuccessfully written to the peripheral for a write cycle. peri_mmu_berr1 Out Bus Error signal. Indicates a bus error has occurred in accessingthe selected peripheral CPU Subsystem Bus Interface to LEON AHB bridgesignals cpu_start_access 1 In Start Access signal from the LEON AHBbridge indicating the start of a data transfer and that the cpu_adr,cpu_dataout, cpu_rwn and cpu_acode signals are all valid. This signal isonly asserted during the first cycle of an access.

[1222] Description:

[1223] The CPU Subsystem Bus Interface block performs simple addressdecoding to select a peripheral and multiplexing of the returned signalsfrom the various peripheral blocks. The base addresses used for thedecode operation are defined in Table. Note that access to the MMUconfiguration registers are handled by the MMU Control Block rather thanthe CPU Subsystem Bus Interface block. The CPU Subsystem Bus Interfaceblock operation is described by the following pseudocode: masked_cpu_adr= cpu_adr[17:12] case (masked_cpu_adr) when TIM_base[17:12] cpu_tim_sel= peri_access_en  // The peri_access_en signal will have theperi_mmu_data = tim_cpu_data // timing required for block selectsperi_mmu_rdy = tim_cpu_rdy peri_mmu_berr = tim_cpu_berrall_other_selects = 0  // Shorthand to ensure other cpu_block_selsignals // remain deasserted when LSS_base[17:12] cpu_lss_sel =peri_access_en peri_mmu_data = lss_cpu_data peri_mmu_rdy = lss_cpu_rdyperi_mmu_berr = lss_cpu_berr all_other_selects = 0 when GPIO_base[17:12]cpu_gpio_sel = peri_access_en peri_mmu_data = gpio_cpu_data peri_mmu_rdy= gpio_cpu_rdy peri_mmu_berr = gpio_cpu_berr all_other_selects = 0 whenSCB_base[17:12] cpu_scb_sel = peri_access_en peri_mmu_data =scb_cpu_data peri_mmu_rdy = scb_cpu_rdy peri_mmu_berr = scb_cpu_berrall_other_selects = 0 when ICU_base[17:12] cpu_icu_sel = peri_access_enperi_mmu_data = icu_cpu_data peri_mmu_rdy = icu_cpu_rdy peri_mmu_berr =icu_cpu_berr all_other_selects = 0 when CPR_base[17:12] cpu_cpr_sel =peri_access_en peri_mmu_data = cpr_cpu_data peri_mmu_rdy = cpr_cpu_rdyperi_mmu_berr = cpr_cpu_berr all_other_selects = 0 when ROM_base[17:12]cpu_rom_sel = peri_access_en peri_mmu_data = rom_cpu_data peri_mmu_rdy =rom_cpu_rdy peri_mmu_berr = rom_cpu_berr all_other_selects = 0 whenPSS_base[17:12] cpu_pss_sel = peri_access_en peri_mmu_data =pss_cpu_data peri_mmu_rdy = pss_cpu_rdy peri_mmu_berr = pss_cpu_berrall_other_selects = 0 when DIU_base[17:12] cpu_diu_sel = peri_access_enperi_mmu_data = diu_cpu_data peri_mmu_rdy = diu_cpu_rdy peri_mmu_berr =diu_cpu_berr all_other_selects = 0 when PCU_base[17:12] cpu_pcu_sel =peri_access_en peri_mmu_data = pcu_cpu_data peri_mmu_rdy = pcu_cpu_rdyperi_mmu_berr = pcu_cpu_berr all_other_selects = 0 when othersall_block_selects = 0 peri_mmu_data = 0x00000000 peri_mmu_rdy = 0peri_mmu_berr = 1 end case

[1224] 11.6.6.3 MMU Control Block

[1225] The MMU Control Block determines whether every CPU access is avalid access. No more than one cycle is to be consumed in determiningthe validity of an access and all accesses must terminate with theassertion of either mmu_cpu_rdy or mmu_cpu_berr. To safeguard againststalling the CPU a simple bus timeout mechanism will be supported. TABLE24 MMU Control Block I/Os Port name Pins I/O Description Global SoPECsignals prst_n 1 In Global reset. Synchronous to pclk, active low. pclk1 In Global clock Toplevel/Common MMU Control Block signalscpu_adr[21:2] 22 Out Address bus for both DRAM and peripheral access.cpu_acode[1:0] 2 Out CPU access code signals (cpu_mmu_acode) retimed tomeet the CPU Subsystem Bus timing requirements dram_access_en 1 Out DRAMAccess Enable signal. Indicates that the current CPU access is a validDRAM access. MMU Control Block to LEON AHB bridge signalscpu_mmu_adr[31:0] 32 In CPU core address bus. cpu_dataout[31:0] 32 InToplevel CPU data bus mmu_cpu_data[31:0] 32 Out Data bus to the CPUcore. Carries the data for all CPU read operations cpu_rwn 1 In ToplevelCPU Read/notWrite signal. cpu_mmu_acode[1:0] 2 In CPU access codesignals mmu_cpu_rdy 1 Out Ready signal to the CPU core. Indicates thecompletion of all valid CPU accesses. mmu_cpu_berr 1 Out Bus Errorsignal to the CPU core. This signal is asserted to terminate an invalidaccess. cpu_start_access 1 In Start Access signal from the LEON AHBbridge indicating the start of a data transfer and that the cpu_adr,cpu_dataout, cpu_rwn and cpu_acode signals are all valid. This signal isonly asserted during the first cycle of an access. cpu_iack 1 InInterrupt Acknowledge signal from the CPU. This signal is only assertedduring an interrupt acknowledge cycle. cpu_ben[1:0] 2 In Byte enablesignals indicating which bytes of the 32- bit bus are being accessed.MMU Control Block to CPU Subsystem Bus Interface signals cpu_adr[17:12]8 Out Toplevel CPU Address bus. Only bits 17-12 are required to decodethe peripherals address space peri_access_en 1 Out Enable Access signal.A peripheral access cannot be initiated unless it has been enabled bythe MMU Control Unit peri_mmu_data[31:0] 32 In Data bus from theselected peripheral peri_mmu_rdy 1 In Data Ready signal. Indicates thedata on the peri_mmu_data bus is valid for a read cycle or that the datawas successfully written to the peripheral for a write cycle.peri_mmu_berr 1 In Bus Error signal. Indicates a bus error has occurredin accessing the selected peripheral

[1226] Description:

[1227] The MMU Control Block is responsible for the MMU's corefunctionality, namely determining whether or not an access to any partof the address map is valid. An access is considered valid if it is to amapped area of the address space and if the CPU is running in theappropriate mode for that address space. Furthermore the MMU controlblock must correctly handle the special cases that are: an interruptacknowledge cycle, a reset exception vector fetch, an access thatcrosses a 256-bit DRAM word boundary and a bus timeout condition. Thefollowing pseudocode shows the logic required to implement the MMUControl Block functionality. It does not deal with the timingrelationships of the various signals—it is the designer's responsibilityto ensure that these relationships are correct and comply with thedifferent bus protocols. For simplicity the pseudocode is split up intonumbered sections so that the functionality may be seen more easily.

[1228] It is important to note that the style used for the pseudocodewill differ from the actual coding style used in the RTL implementation.The pseudocode is only intended to capture the required functionality,to clearly show the criteria that need to be tested rather than todescribe how the implementation should be performed. In particular thedifferent comparisons of the address used to determine which part of thememory map, which DRAM region (if applicable) and the permissionchecking should all be performed in parallel (with results ORed togetherwhere appropriate) rather than sequentially as the pseudocode implies.

[1229] PS0 Description: This first segment of code defines a number ofconstants and variables that are used elsewhere in this description.Most signals have been defined in the I/O descriptions of the MMUsub-blocks that precede this section of the document. Thepost_reset_state variable is used later (in section PS4) to determine ifwe should trap a null pointer access. PS0: const UnusedBottom =0x002AC000 const DRAMTop = 0x4027FFFF const UserDataSpace = b01 constUserProgramSpace = b00 const SupervisorDataSpace = b11 constSupervisorProgramSpace = b10 const ResetExceptionCycles = 0x2cpu_adr_peri_masked[5:0] = cpu_mmu_adr[17:12] cpu_adr_dram_masked[16:0]= cpu_mmu_adr & 0x003FFFE0 if (prst_n = = 0) then // Initialiseeverything cpu_adr = cpu_mmu_adr[21:2] peri_access_en = 0 dram_access_en= 0 mmu_cpu_data = peri_mmu_data mmu_cpu_rdy = 0 mmu_cpu_berr = 0post_reset_state = TRUE access_initiated = FALSE cpu_access_cnt = 0 //The following is used to determine if we are coming out of reset for thepurposes of // reset exception vector redirection. There may be aconvenient signal in the CPU core // that we could use instead of this.if ((cpu_start_access = = 1) AND (cpu_access_cnt < ResetExceptionCycles)AND (clock_tick = = TRUE)) then cpu_access_cnt = cpu_access_cnt +1 elsepost_reset_state = FALSE

[1230] PS1 Description: This section is at the top of the hierarchy thatdetermines the validity of an access. The address is tested to see whichmacro-region (i.e. Unused, CPU Subsystem or DRAM) it falls into orwhether the reset exception vector is being accessed. PS1: if(cpu_mmu_adr >= UnusedBottom) then // The access is to an invalid areaof the address space. See section PS2 elsif  ((cpu_mmu_adr  >  DRAMTop) AND (cpu_mmu_adr < UnusedBottom)) then // We are in the CPUSubsystem/PEP Subsystem address space. See section PS3 // Only remainingpossibility is an access to DRAM address space // First we need tointercept the special case for the reset exception vector elsif(cpu_mmu_adr < 0x00000010) then // The reset exception is beingaccessed. See section PS4 elsif  ((cpu_adr_dram_masked  >= Region0Bottom) AND (cpu_adr_dram_masked <= Region0Top) ) then // We arein Region0. See section PS5 elsif  ((cpu_adr_dram_masked  >= RegionNBottom) AND (cpu_adr_dram_masked <= RegionNTop) ) then // we arein RegionN // Repeat the Region0 (i.e. section PS5) logic for each ofRegion1 to Region7 else // We could end up here if there were gaps inthe DRAM regions peri_access_en = 0 dram_access_en = 0 mmu_cpu_berr = 1// we have an unknown access error, most likely due to hittingmmu_cpu_rdy = 0 // a gap in the DRAM regions // Only thing remaining isto implement a bus timeout function. This is done in PS6 end

[1231] PS2 Description: Accesses to the large unused area of the addressspace are trapped by this section. No bus transactions are initiated andthe mmu_cpu_berr signal is asserted. PS2: elsif (cpu_mmu_adr >=UnusedBottom) then peri_access_en = 0 // The access is to an invalidarea of the address space dram_access_en = 0 mmu_cpu_berr = 1mmu_cpu_rdy = 0

[1232] PS3 Description: This section deals with accesses to CPUSubsystem peripherals, including the MMU itself. If the MMU registersare being accessed then no external bus transactions are required.Access to the MMU registers is only permitted if the CPU is making adata access from supervisor mode, otherwise a bus error is asserted andthe access terminated. For non-MMU accesses then transactions occur overthe CPU Subsystem Bus and each peripheral is responsible for determiningwhether or not the CPU is in the correct mode (based on the cpu_acodesignals) to be permitted access to its registers. Note that all of thePEP registers are accessed via the PCU which is on the CPU SubsystemBus. PS3: elsif ((cpu_mmu_adr > DRAMTop) AND (cpu_mmu_adr <UnusedBottom)) then // We are in the CPU Subsystem/PEP Subsystem addressspace cpu_adr = cpu_mmu_adr[21:2] if (cpu_adr_peri_masked = = MMU_base) then // access is to local registers peri_access_en = 0 dram_access_en= 0 if (cpu_acode = = SupervisorDataSpace) then for (i=0; i<26; i++) {if ((i = = cpu_mmu_adr[6:2]) then // selects the addressed register if(cpu_rwn = = 1) then mmu_cpu_data[16:0] = MMUReg[i] // MMUReg[i] is oneof the mmu_cpu_rdy = 1 // registers in Table mmu_cpu_berr = 0 else //write cycle MMUReg[i] = cpu_dataout[16:0] mmu_cpu_rdy = 1 mmu_cpu_berr =0 else // there is no register mapped to this address mmu_cpu_berr =1 // do we really want a bus_error here as registers mmu_cpu_rdy = 0 //are just mirrored in other blocks else // we have an access violationmmu_cpu_berr = 1 mmu_cpu_rdy = 0 else // access is to something else onthe CPU Subsystem Bus peri_access_en = 1 dram_access_en = 0 mmu_cpu_data= peri_mmu_data mmu_cpu_rdy = peri_mmu_rdy mmu_cpu_berr = peri_mmu_berr

[1233] PS4 Description: The only correct accesses to the locationsbeneath 0x00000010 are fetches of the reset trap handling routine andthese should be the first accesses after reset. Here we trap all otheraccesses to these locations regardless of the CPU mode. The most likelycause of such an access will be the use of a null pointer in the programexecuting on the CPU. PS4: elsif (cpu_mmu_adr < 0x00000010) then if(post_reset_state = = TRUE)) then cpu adr = cpu mmu adr[21:2]peri_access_en = 1 dram_access_en = 0 mmu_cpu_data = peri_mmu_datammu_cpu_rdy = peri_mmu_rdy mmu_cpu_berr = peri_mmu_berr else // we havea problem (almost certainly a null pointer) peri_access_en = 0dram_access_en = 0 mmu_cpu_berr = 1 mmu_cpu_rdy = 0

[1234] PS5 Description: This large section of pseudocode simply checkswhether the access is within the bounds of DRAM Region0 and if sowhether or not the access is of a type permitted by the Region0Controlregister. If the access is permitted then a DRAM access is initiated. Ifthe access is not of a type permitted by the Region0Control registerthen the access is terminated with a bus error. PS5:elsif ((cpu_adr_dram_masked >= Region0Bottom) AND (cpu_adr_dram_masked<= Region0Top) ) then // we are in Region0 cpu_adr = cpu_mmu_adr[21:2]if (cpu_rwn = = 1) then if ((cpu_acode = = SupervisorProgramSpace ANDRegion0Control[2] = = 1)) OR (cpu_acode = = UserProgramSpace ANDRegion0Control[5] = = 1)) then //  this is a valid instruction fetchfrom Region0 //  The dram_cpu_data bus goes directly to the LEON// AHB bridge which also handles the hready generation peri_access_en =0 dram_access_en = 1 mmu_cpu_berr = 0elsif ((cpu_acode = = SupervisorDataSpace AND Region0Control[0] = = 1) OR (cpu_acode = = UserDataSpace AND Region0Control[3] = = 1)) then // this is a valid read access from Region0 peri_access_en = 0dram_access_en = 1 mmu_cpu_berr = 0 else  // we have an access violationperi_access_en = 0 dram_access_en = 0 mmu_cpu_berr = 1 mmu_cpu_rdy = 0else // it is a write access if ((cpu_acode = = SupervisorDataSpace ANDRegion0Control[1] = = 1) OR (cpu_acode = = UserDataSpace ANDRegion0Control[4] = = 1)) then // this is a valid write access toRegion0 peri_access_en = 0 dram_access_en = 1 mmu_cpu_berr = 0 else  //we have an access violation peri_access_en = 0 dram_access_en = 0mmu_cpu_berr = 1 mmu_cpu_rdy = 0

[1235] PS6 Description: This final section of pseudocode deals with thespecial case of a bus timeout. This occurs when an access has beeninitiated but has not completed before the BusTimeout number of pclkcycles. While access to both DRAM and CPU/PEP Subsystem registers willtake a variable number of cycles (due to DRAM traffic, PCU commandexecution or the different timing required to access registers inimported IP) each access should complete before a timeout occurs.Therefore it should not be possible to stall the CPU by locking eitherthe CPU Subsystem or DIU buses. However given the fatal effect such astall would have it is considered prudent to implement bus timeoutdetection. PS6: // Only thing remaining is to implement a bus timeoutfunction. if ((cpu_start_access = = 1) then access_initiated = TRUEtimeout_countdown = BusTimeout if ((mmu_cpu_rdy = = 1 ) OR (mmu_cpu_berr= =1 )) then access_initiated = FALSE peri_access_en = 0 dram_access_en= 0 if ((clock_tick = = TRUE) AND (access_initiated = = TRUE) AND(BusTimeout != 0)) if (timeout_countdown > 0) then timeout_countdown − −else // timeout has occurred peri_access_en = 0 // abort the accessdram_access_en = 0 mmu_cpu_berr = 1 mmu_cpu_rdy = 0

[1236] 11.7 LEON Caches

[1237] The version of LEON implemented on SoPEC features 1 kB of ICacheand 1 kB of DCache. Both caches are direct mapped and feature 8 wordlines so their data RAMs are arranged as 32×256-bit and their tag RAMsas 32×30-bit (itag) or 32×32-bit (dtag). Like most of the rest of theLEON code used on SoPEC the cache controllers are taken from theleon2-1.0.7 release. The LEON cache controllers and cache RAMs have beenmodified to ensure that an entire 256-bit line is refilled at a time tomake maximum use out of the memory bandwidth offered by the embeddedDRAM organization (DRAM lines are also 256-bit). The data cachecontroller has also been modified to ensure that user mode code cannotaccess the DCache contents unless it is authorised to do so. A blockdiagram of the LEON CPU core as implemented on SoPEC is shown in FIG. 23below.

[1238] In this diagram dotted lines are used to indicate hierarchy andred items represent signals or wrappers added as part of the SoPECmodifications. LEON makes heavy use of VHDL records and the records usedin the CPU core are described in Table 25. Unless otherwise stated therecords are defined in the iface.vhd file (part of the LEON release) andthis should be consulted for a complete breakdown of the recordelements. TABLE 25 Relevant LEON records Record Name Description rfiRegister File Input record. Contains address, datain and control signalsfor the register file. rfo Register File Output record. Contains thedata out of the dual read port register file. ici Instruction Cache Inrecord. Contains program counters from different stages of the pipelineand various control signals ico Instruction Cache Out record. Containsthe fetched instruction data and various control signals. This record isalso sent to the DCache (i.e. icol) so that diagnostic accesses (e.g.lda/sta) can be serviced. dci Data Cache In record. Contains address anddata buses from different stages of the pipeline (execute & memory) andvarious control signals dco Data Cache Out record. Contains the dataretrieved from either memory or the caches and various control signals.This record is also sent to the ICache (i.e. dcol) so that diagnosticaccesses (e.g. lda/sta) can be serviced. iui Integer Unit In record.This record contains the interrupt request level and a record for usewith LEONs Debug Support Unit (DSU) iuo Integer Unit Out record. Thisrecord contains the acknowledged interrupt request level with controlsignals and a record for use with LEONs Debug Support Unit (DSU) mciiMemory to Cache Icache In record. Contains the address of an Icache missand various control signals mcio Memory to Cache Icache Out record.Contains the returned data from memory and various control signals mcdiMemory to Cache Dcache In record. Contains the address and data of aDcache miss or write and various control signals mcdo Memory to CacheDcache Out record. Contains the returned data from memory and variouscontrol signals ahbi AHB In record. This is the input record for an AHBmaster and contains the data bus and AHB control signals. Thedestination for the signals in this record is the AHB controller. Thisrecord is defined in the amba.vhd file ahbo AHB Out record. This is theoutput record for an AHB master and contains the address and data busesand AHB control signals. The AHB controller drives the signals in thisrecord. This record is defined in the amba.vhd file ahbsi AHB Slave Inrecord. This is the input record for an AHB slave and contains theaddress and data buses and AHB control signals. It is used by the DCacheto facilitate cache snooping (this feature is not enabled in SoPEC).This record is defined in the amba.vhd file crami Cache RAM In record.This record is composed of records of records which contain the address,data and tag entries with associated control signals for both the ICacheRAM and DCache RAM cramo Cache RAM Out record. This record is composedof records of records which contain the data and tag entries withassociated control signals for both the ICache RAM and DCache RAMiline_rdy Control signal from the ICache controller to the instructioncache memory. This signal is active (high) when a full 256- bit line (ondram_cpu_data) is to be written to cache memory. dline_rdy Controlsignal from the DCache controller to the data cache memory. This signalis active (high) when a full 256-bit line (on dram_cpu_data) is to bewritten to cache memory. dram_cpu_data 256-bit data bus from theembedded DRAM

[1239] 11.7.1 Cache Controllers

[1240] The LEON cache module consists of three components: the ICachecontroller (icache.vhd), the DCache controller (dcache.vhd) and the AHBbridge (acache.vhd) which translates all cache misses into memoryrequests on the AHB bus.

[1241] In order to enable full line refill operation a few changes hadto be made to the cache controllers. The ICache controller was modifiedto ensure that whenever a location in the cache was updated (i.e. thecache was enabled and was being refilled from DRAM) all locations onthat cache line had their valid bits set to reflect the fact that thefull line was updated. The iline_rdy signal is asserted by the ICachecontroller when this happens and this informs the cache wrappers toupdate all locations in the idata RAM for that line.

[1242] A similar change was made to the DCache controller except thatthe entire line was only updated following a read miss and that existingwrite through operation was preserved. The DCache controller uses thedline_rdy signal to instruct the cache wrapper to update all locationsin the ddata RAM for a line. An additional modification was also made toensure that a double-word load instruction from a non-cached locationwould only result in one read access to the DIU i.e. the second readwould be serviced by the data cache. Note that if the DCache is turnedoff then a double-word load instruction will cause two DIU read accessesto occur even though they will both be to the same 256-bit DRAM line.

[1243] The DCache controller was further modified to ensure that usermode code cannot access cached data to which it does not have permission(as determined by the relevant RegionNControl register settings at thetime the cache line was loaded). This required an extra 2 bits of taginformation to record the user read and write permissions for each cacheline. These user access permissions can be updated in the same manner asthe other tag fields (i.e. address and valid bits) namely by linerefill, STA instruction or cache flush. The user access permission bitsare checked every time user code attempts to access the data cache andif the permissions of the access do not agree with the permissionsreturned from the tag RAM then a cache miss occurs. As the MMU evaluatesthe access permissions for every cache miss it will generate theappropriate exception for the forced cache miss caused by the errantuser code. In the case of a prohibited read access the trap will beimmediate while a prohibited write access will result in a deferredtrap. The deferred trap results from the fact that the prohibited writeis committed to a write buffer in the DCache controller and programexecution continues until the prohibited write is detected by the MMUwhich may be several cycles later. Because the errant write was treatedas a write miss by the DCache controller (as it did not match the storeduser access permissions) the cache contents were not updated and soremain coherent with the DRAM contents (which do not get updated becausethe MMU intercepted the prohibited write). Supervisor mode code is notsubject to such checks and so has free access to the contents of thedata cache.

[1244] In addition to AHB bridging, the ACache component also performsarbitration between ICache and DCache misses when simultaneous missesoccur (the DCache always wins) and implements the Cache Control Register(CCR). The leon2-1.0.7 release is inconsistent in how it handlescacheability: For instruction fetches the cacheability (i.e. is theaccess to an area of memory that is cacheable) is determined by theICache controller while the ACache determines whether or not a dataaccess is cacheable. To further complicate matters the DCache controllerdoes determine if an access resulting from a cache snoop by another AHBmaster is cacheable (Note that the SoPEC ASIC does not implement cachesnooping as it has no need to do so). This inconsistency has beencleaned up in more recent LEON releases but is preserved here tominimise the number of changes to the LEON RTL. The cache controllerswere modified to ensure that only DRAM accesses (as defined by the SoPECmemory map) are cached.

[1245] The only functionality removed as a result of the modificationswas support for burst fills of the ICache. When enabled burst fillswould refill an ICache line from the location where a miss occurred upto the end of the line. As the entire line is now refilled at once (whenexecuting from DRAM) this functionality is no longer required.Furthermore more substantial modifications to the ICache controllerwould be needed if we wished to preserve this function without adverselyaffecting full line refills. The CCR was therefore modified to ensurethat the instruction burst fetch bit (bit16) was tied low and could notbe written to.

[1246] 11.7.1.1 LEON Cache Control Register

[1247] The CCR controls the operation of both the I and D caches. Notethat the bitfields used on the SoPEC implementation of this register arebased on the LEON v1.0.7 implementation and some bits have their valuestied off. See section 4 of the LEON manual for a description of the LEONcache controllers. TABLE 26 LEON Cache Control Register Field Namebit(s) Description ICS 1:0 Instruction cache state: 00 - disabled 01 -frozen 10 - disabled 11 - enabled Reserved 13:6  Reserved. Reads as 0.DCS 3:2 Data cache state: 00 - disabled 01 - frozen 10 - disabled 11 -enabled IF 4 ICache freeze on interrupt 0 - Do not freeze the ICachecontents on taking an interrupt 1 - Freeze the ICache contents on takingan interrupt DF 5 DCache freeze on interrupt 0 - Do not freeze theDCache contents on taking an interrupt 1 - Freeze the DCache contents ontaking an interrupt Reserved 13:6  Reserved. Reads as 0. DP 14 Datacache flush pending. 0 - No DCache flush in progress 1 - DCache flush inprogress This bit is Readonly. IP 15 Instruction cache flush pending.0 - No ICache flush in progress 1 - ICache flush in progress This bit isReadonly. IB 16 Instruction burst fetch enable. This bit is tied low onSoPEC because it would interfere with the operation of the cachewrappers. Burst refill functionality is automatically provided in SoPECby the cache wrappers. Reserved 20:17 Reserved. Reads as 0. FI 21 Flushinstruction cache. Writing a 1 this bit will flush the ICache. Reads as0. FD 22 Flush data cache. Writing a 1 this bit will flush the DCache.Reads as 0. DS 23 Data cache snoop enable. This bit is tied low in SoPECas there is no requirement to snoop the data cache. Reserved 31:24Reserved. Reads as 0.

[1248] 11.7.2 Cache Wrappers

[1249] The cache RAMs used in the leon2-1.0.7 release needed to bemodified to support full line refills and the correct IBM macros alsoneeded to be instantiated. Although they are described as RAMsthroughout this document (for consistency), register arrays are actuallyused to implement the cache RAMs. This is because IBM SRAMs were notavailable in suitable configurations (offered configurations were toobig) to implement either the tag or data cache RAMs. Both instructionand data tag RAMs are implemented using dual port (1 Read & 1 Write)register arrays and the clocked write-through versions of the registerarrays were used as they most closely approximate the single port SRAMLEON expects to see.

[1250] 11.7.2.1 Cache Tag RAM Wrappers

[1251] The itag and dtag RAMs differ only in their width—the itag is a32×30 array while the dtag is a 32×32 array with the extra 2 bits beingused to record the user access permissions for each line. When readusing a LDA instruction both tags return 32-bit words. The tag fieldsare described in Table 27 and Table 28 below. Using the IBM namingconventions the register arrays used for the tag RAMs are calledRA032X30D2P2W1R1M3 for the itag and RA032X32D2P2W1R1M3 for the dtag. Theibm_syncram wrapper used for the tag RAMs is a simple affair that justmaps the wrapper ports on to the appropriate ports of the IBM registerarray and ensures the output data has the correct timing by registeringit. The tag RAMs do not require any special modifications to handle fullline refills. TABLE 27 LEON Instruction Cache Tag Field Name bit(s)Description Valid 7:0 Each valid bit indicates whether or not thecorresponding word of the cache line contains valid data Reserved 9:8Reserved - these bits do not exist in the itag RAM. Reads as 0. Address31:10 The tag address of the cache line

[1252] TABLE 28 LEON Data Cache Tag Field Name bit(s) Description Valid7:0 Each valid bit indicates whether or not the corresponding Word ofthe cache line contains valid data URP 8 User read permission. 0 - Usermode reads will force a refill of this line 1 - User mode code can readfrom this cache line. UWP 9 User write permission. 0 - User mode writeswill not be written to the cache 1 - User mode code can write to thiscache line. Address 31:10 The tag address of the cache line

[1253] 11.7.2.2 Cache Data RAM Wrappers

[1254] The cache data RAM contains the actual cached data and nothingelse. Both the instruction and data cache data RAMs are implementedusing 8 32×32-bit register arrays and some additional logic to supportfull line refills. Using the IBM naming conventions the register arraysused for the tag RAMs are called RA032X32D2P2W1R1M3. The ibm_cdram_wrapwrapper used for the tag RAMs is shown in FIG. 24 below.

[1255] To the cache controllers the cache data RAM wrapper looks like a256×32 single port SRAM (which is what they expect to see) with an inputto indicate when a full line refill is taking place (the line_rdysignal). Internally the 8-bit address bus is split into a 5-bitlineaddress, which selects one of the 32 256-bit cache lines, and a3-bit wordaddress which selects one of the 8 32-bit words on the cacheline. Thus each of the 8 32×32 register arrays contains one 32-bit wordof each cache line. When a full line is being refilled (indicated byboth the line rdy and write signals being high) every register array iswritten to with the appropriate 32 bits from the linedatain bus whichcontains the 256-bit line returned by the DIU after a cache miss. Whenjust one word of the cache line is to be written (indicated by the writesignal being high while the line_rdy is low) then the wordaddress isused to enable the write signal to the selected register array only—allother write enable signals are kept low. The data cache controllerhandles byte and half-word write by means of a read-modify-writeoperation so writes to the cache data RAM are always 32-bit.

[1256] The wordaddress is also used to select the correct 32-bit wordfrom the cache line to return to the LEON integer unit.

[1257] 11.8 Realtime Debug Unit (RDU)

[1258] The RDU facilitates the observation of the contents of most ofthe CPU addressable registers in the SoPEC device in addition to somepseudo-registers in realtime. The contents of pseudo-registers, i.e.registers that are collections of otherwise unobservable signals andthat do not affect the functionality of a circuit, are defined in eachblock as required. Many blocks do not have pseudo-registers and someblocks (e.g. ROM, PSS) do not make debug information available to theRDU as it would be of little value in realtime debug.

[1259] Each block that supports realtime debug observation features aDebugSelect register that controls a local mux to determine whichregister is output on the block's data bus (i.e. block_cpu_data). Onesmall drawback with reusing the blocks data bus is that the debug datacannot be present on the same bus during a CPU read from the block. Anaccompanying active high block_cpu_debug_valid signal is used toindicate when the data bus contains valid debug data and when the bus isbeing used by the CPU. There is no arbitration for the bus as the CPUwill always have access when required. A block diagram of the RDU isshown in FIG. 25. TABLE 29 RDU I/Os Port name Pins I/O Descriptiondiu_cpu_data 32 In Read data bus from the DIU block cpr_cpu_data 32 InRead data bus from the CPR block gpio_cpu_data 32 In Read data bus fromthe GPIO block icu_cpu_data 32 In Read data bus from the ICU blocklss_cpu_data 32 In Read data bus from the LSS block pcu_cpu_debug_data32 In Read data bus from the PCU block scb_cpu_data 32 In Read data busfrom the SCB block tim_cpu_data 32 In Read data bus from the TIM blockdiu_cpu_debug_valid 1 In Signal indicating the data on the diu_cpu_databus is valid debug data. tim_cpu_debug_valid 1 In Signal indicating thedata on the tim_cpu_data bus is valid debug data. scb_cpu_debug_valid 1In Signal indicating the data on the scb_cpu_data bus is valid debugdata. pcu_cpu_debug_valid 1 In Signal indicating the data on thepcu_cpu_data bus is valid debug data. lss_cpu_debug_valid 1 In Signalindicating the data on the lss_cpu_data bus is valid debug data.icu_cpu_debug_valid 1 In Signal indicating the data on the icu_cpu_databus is valid debug data. gpio_cpu_debug_valid 1 In Signal indicating thedata on the gpio_cpu_data bus is valid debug data. cpr_cpu_debug_valid 1In Signal indicating the data on the cpr_cpu_data bus is valid debugdata. debug_data_out 32 Out Output debug data to be muxed on to thePHI/GPIO/other pins debug_data_valid 1 Out Debug valid signal indicatingthe validity of the data on debug_data_out. This signal is used in alldebug configurations debug_cntrl 33 Out Control signal for each debugdata line indicating whether or not the debug data should be selected bythe pin mux

[1260] As there are no spare pins that can be used to output the debugdata to an external capture device some of the existing I/Os will have adebug multiplexer placed in front of them to allow them be used as debugpins. Furthermore not every pin that has a debug mux will always beavailable to carry the debug data as they may be engaged in theirprimary purpose e.g. as a GPIO pin. The RDU therefore outputs adebug_cntrl signal with each debug data bit to indicate whether the muxassociated with each debug pin should select the debug data or thenormal data for the pin. The DebugPinSel1 and DebugPinSel2 registers areused to determine which of the 33 potential debug pins are enabled fordebug at any particular time.

[1261] As it may not always be possible to output a full 32-bit debugword every cycle the RDU supports the outputting of an n-bit sub-wordevery cycle to the enabled debug pins. Each debug test would then needto be re-run a number of times with a different portion of the debugword being output on the n-bit sub-word each time. The data from eachrun should then be correlated to create a full 32-bit (or whatever sizeis needed) debug word for every cycle. The debug_data_valid and pclk_outsignals will accompany every sub-word to allow the data to be sampledcorrectly. The pclk_out signal is sourced close to its output pad ratherthan in the RDU to minimise the skew between the rising edge of thedebug data signals (which should be registered close to their outputpads) and the rising edge of pclk_out.

[1262] As multiple debug runs will be needed to obtain a complete set ofdebug data the n-bit sub-word will need to contain a different bitpattern for each run. For maximum flexibility each debug pin has anassociated DebugDataSrc register that allows any of the 32 bits of thedebug data word to be output on that particular debug data pin. Thedebug data pin must be enabled for debug operation by having itscorresponding bit in the DebugPinSel registers set for the selecteddebug data bit to appear on the pin.

[1263] The size of the sub-word is determined by the number of enableddebug pins which is controlled by the DebugPinSel registers. Note thatthe debug_data_valid signal is always output. Furthermore debug_cntrl[0](which is configured by DebugPinSel1) controls the mux for both thedebug_data_valid and pclk_out signals as both of these must be enabledfor any debug operation. The mapping of debug_data_out[n] signals ontoindividual pins will take place outside the RDU. This mapping isdescribed in Table 30 below. TABLE 30 DebugPinSel mapping bit # PinDebugPinSel1 phi_frclk. The debug_data_valid signal will appear on thispin when enabled. Enabling this pin also automatically enables thephi_readl pin which will output the pclk_out signal DebugPinSel2(0-31)gpio[0 ... 31]

[1264] TABLE 31 RDU Configuration Registers Address offset from MMU_baseRegister #bits Reset Description 0x80 DebugSrc  4 0x00 Denotes whichblock is supplying the debug data. The encoding of this block is givenbelow. 0 - MMU 1 - TIM 2 - LSS 3 - GPIO 4 - SCB 5 - ICU 6 - CPR 7 - DIU8 - PCU 0x84 DebugPinSel1  1 0x0 Determines whether the phi_frclk andphi_readl pins are used for debug output. 1 - Pin outputs debug data 0 -Normal pin function 0x88 DebugPinSel2 32 0x0000_0000 Determines whethera pin is used for debug data output. 1 - Pin outputs debug data 0 -Normal pin function 0x8C to 0x108 DebugDataSrc 32 × 5 0x00 Selects whichbit of the 32-bit debug data [31:0] word will be output ondebug_data_out[N]

[1265] 11.9 Interrupt Operation

[1266] The interrupt controller unit (see chapter 14) generates aninterrupt request by driving interrupt request lines with theappropriate interrupt level. LEON supports 15 levels of interrupt withlevel 15 as the highest level (the SPARC architecture manual [36] statesthat level 15 is non-maskable but we have the freedom to mask this ifdesired). The CPU will begin processing an interrupt exception whenexecution of the current instruction has completed and it will only doso if the interrupt level is higher than the current processor priority.If a second interrupt request arrives with the same level as anexecuting interrupt service routine then the exception will not beprocessed until the executing routine has completed.

[1267] When an interrupt trap occurs the LEON hardware will place theprogram counters (PC and nPC) into two local registers. The interrupthandler routine is expected, as a minimum, to place the PSR register inanother local register to ensure that the LEON can correctly return toits pre-interrupt state. The 4-bit interrupt level (irl) is also writtento the trap type (tt) field of the TBR (Trap Base Register) by hardware.The TBR then contains the vector of the trap handler routine theprocessor will then jump. The TBA (Trap Base Address) field of the TBRmust have a valid value before any interrupt processing can occur so itshould be configured at an early stage.

[1268] Interrupt pre-emption is supported while ET (Enable Traps) bit ofthe PSR is set. This bit is cleared during the initial trap processing.In initial simulations the ET bit was observed to be cleared for up to30 cycles. This causes significant additional interrupt latency in theworst case where a higher priority interrupt arrives just as a lowerpriority one is taken.

[1269] The interrupt acknowledge cycles shown in FIG. 26 below arederived from simulations of the LEON processor. The SoPEC toplevelinterrupt signals used in this diagram map directly to the LEONinterrupt signals in the iui and iuo records. An interrupt is assertedby driving its (encoded) level on the icu_cpu_ilevel[3:0] signals (whichmap to iui.irl[3:0]). The LEON core responds to this, with variabletiming, by reflecting the level of the taken interrupt on thecpu_icu_ilevel[3:0] signals (mapped to iuo.irl[3:0]) and asserting theacknowledge signal cpu_iack (iuo.intack). The interrupt controller thenremoves the interrupt level one cycle after it has seen the level beenacknowledged by the core. If there is another pending interrupt (oflower priority) then this should be driven on icu_cpu_ilevel[3:0] andthe CPU will take that interrupt (the level 9 interrupt in the examplebelow) once it has finished processing the higher priority interrupt.The cpu_icu_ilevel[3:0] signals always reflect the level of the lasttaken interrupt, even when the CPU has finished processing allinterrupts.

[1270] 11.10 Boot Operation

[1271] See section 17.2 for a description of the SoPEC boot operation.

[1272] 11.11 Software Debug

[1273] Software debug mechanisms are discussed in the “SoPEC SoftwareDebug” document [15].

[1274] 12 Serial Communications Block (SCB)

[1275] 12.1 Overview

[1276] The Serial Communications Block (SCB) handles the movement of alldata between the SoPEC and the host device (e.g. PC) and between masterand slave SoPEC devices. The main components of the SCB are a Full-Speed(FS) USB Device Core, a FS USB Host Core, a Inter-SoPEC Interface (ISI),a DMA manager, the SCB Map and associated control logic. The need forthese components and the various types of communication they provide isevident in a multi-SoPEC printer configuration.

[1277] 12.1.1 Multi-SoPEC Systems

[1278] While single SoPEC systems are expected to form the majority ofSoPEC systems the SoPEC device must also support its use in multi-SoPECsystems such as that shown in FIG. 27. A SoPEC may be assigned any oneof a number of identities in a multi-SoPEC system. A SoPEC may be one ormore of a PrintMaster, a LineSyncMaster, an ISIMaster, a StorageSoPEC oran ISISlave SoPEC.

[1279] 12.1.1.1 ISIMaster Device

[1280] The ISIMaster is the only device that controls the common ISIlines (see FIG. 30) and typically interfaces directly with the host. Inmost systems the ISIMaster will simply be the SoPEC connected to the USBbus. Future systems, however, may employ an ISI-Bridge chip to interfacebetween the host and the ISI bus and in such systems the ISI-Bridge chipwill be the ISIMaster. There can only be one ISIMaster on an ISI bus.

[1281] Systems with multiple SoPECs may have more than one hostconnection, for example there could be two SoPECs communicating with theexternal host over their FS USB links (this would of course require twoUSB cables to be connected), but still only one ISIMaster.

[1282] While it is not expected to be required, it is possible for adevice to hand over its role as the ISIMaster to another device on theISI i.e. the ISIMaster is not necessarily fixed.

[1283] 12.1.1.2 PrintMaster Device

[1284] The PrintMaster device is responsible for coordinating allaspects of the print operation. This includes starting the printoperation in all printing SoPECs and communicating status back to theexternal host. When the ISIMaster is a SoPEC device it is also likely tobe the PrintMaster as well. There may only be one PrintMaster in asystem and it is most likely to be a SoPEC device.

[1285] 12.1.1.3 LineSyncMaster Device

[1286] The LineSyncMaster device generates the lsync pulse that allSoPECs in the system must synchronize their line outputs with. Any SoPECin the system could act as a LineSyncMaster although the PrintMaster isprobably the most likely candidate. It is possible that theLineSyncMaster may not be a SoPEC device at all—it could, for example,come from some OEM motor control circuitry. There may only be oneLineSyncMaster in a system.

[1287] 12.1.1.4 Storage Device

[1288] For certain printer types it may be realistic to use one SoPEC asa storage device without using its print engine capability—that is toeffectively use it as an ISI-attached DRAM. A storage SoPEC wouldreceive data from the ISIMaster (most likely to be an ISI-Bridge chIP)and then distribute it to the other SoPECs as required. No other type ofdata flow (e.g. ISISlave->storage SoPEC->ISISlave) would need to besupported in such a scenario. The SCB supports this functionality at noadditional cost because the CPU handles the task of transferringoutbound data from the embedded DRAM to the ISI transmit buffer. The CPUin a storage SoPEC will have almost nothing else to do.

[1289] 12.1.1.5 ISISlave Device

[1290] Multi-SoPEC systems will contain one or more ISISlave SoPECs. AnISISlave SoPEC is primarily used to generate dot data for the printheadIC it is driving. An ISISlave will not transmit messages on the ISIwithout first receiving permission to do so, via a ping packet (seesection 12.4.4.6), from the ISIMaster

[1291] 12.1.1.6 ISI-Bridge Device

[1292] SoPEC is targeted at the low-cost small office/home office (SoHo)market. It may also be used in future systems that target differentmarket segments which are likely to have a high speed interfacecapability. A future device, known as an ISI-Bridge chip, is envisagedwhich will feature both a high speed interface (such as High-Speed (HS)USB, Ethernet or IEEE1394) and one or more ISI interfaces. The use ofmultiple ISI buses would allow the construction of independent printsystems within the one printer. The ISI-Bridge would be the ISIMasterfor each of the ISI buses it interfaces to.

[1293] 12.1.1.7 External Host

[1294] The external host is most likely (but is not required) to be, aPC. Any system that can act as a USB host or that can interface to anISI-Bridge chip could be the external host. In particular, with thedevelopment of USB On-The-Go (USB OTG), it is possible that a number ofUSB OTG enabled products such as PDAs or digital cameras will be able todirectly interface with a SoPEC printer.

[1295] 12.1.1.8 External USB Device

[1296] The external USB device is most likely (but is not required) tobe, a digital camera. Any system that can act as a USB device could beconnected as an external USB device. This is to facilitate printing inthe absence of a PC.

[1297] 12.1.2 Types of Communication

[1298] 12.1.2.1 Communications with External Host

[1299] The external host communicates directly with the ISIMaster inorder to print pages. When the ISIMaster is a SoPEC, the communicationschannel is FS USB.

[1300] 12.1.2.1.1 External Host to ISIMaster Communication

[1301] The external host will need to communicate the followinginformation to the ISIMaster device:

[1302] Communications channel configuration and maintenance information

[1303] Most data destined for PrintMaster, ISISlave or storage SoPECdevices. This data is simply relayed by the ISIMaster

[1304] Mapping of virtual communications channels, such as USBendpoints, to ISI destination

[1305] 12.1.2.1.2 ISIMaster to External Host Communication

[1306] The ISIMaster will need to communicate the following informationto the external host:

[1307] Communications channel configuration and maintenance information

[1308] All data originating from the PrintMaster, ISISlave or storageSoPEC devices and destined for the external host. This data is simplyrelayed by the ISIMaster

[1309] 12.1.2.1.3 External Host to PrintMaster Communication

[1310] The external host will need to communicate the followinginformation to the PrintMaster device:

[1311] Program code for the PrintMaster

[1312] Compressed page data for the PrintMaster

[1313] Control messages to the PrintMaster

[1314] Tables and static data required for printing e.g. dead nozzletables, dither matrices etc.

[1315] Authenticatable messages to upgrade the printer's capabilities

[1316] 12.1.2.1.4 PrintMaster to External Host Communication

[1317] The PrintMaster will need to communicate the followinginformation to the external host:

[1318] Printer status information (i.e. authentication results, paperempty/jammed etc.)

[1319] Dead nozzle information

[1320] Memory buffer status information

[1321] Power management status

[1322] Encrypted SoPEC_id for use in the generation of PRINTER_QA keysduring factory programming

[1323] 12.1.2.1.5 External Host to ISISlave Communication

[1324] All communication between the external host and ISISlave SoPECdevices must be direct (via a dedicated connection between the externalhost and the ISISlave) or must take place via the ISIMaster. In the caseof a SoPEC ISIMaster it is possible to configure each individual USBendpoint to act as a control channel to an ISISlave SoPEC if desired,although the endpoints will be more usually used to transport data. Theexternal host will need to communicate the following information toISISlave devices over the comms/lSI:

[1325] Program code for ISISlave SoPEC devices

[1326] Compressed page data for ISISlave SoPEC devices

[1327] Control messages to the ISISlave SoPEC (where a control channelis supported)

[1328] Tables and static data required for printing e.g. dead nozzletables, dither matrices etc.

[1329] Authenticatable messages to upgrade the printer's capabilities

[1330] 12.1.2.1.6 ISISlave to External Host Communication

[1331] All communication between the ISISlave SoPEC devices and theexternal host must take place via the ISIMaster. The ISISlave will needto communicate the following information to the external host over thecomms/ISI:

[1332] Responses to the external host's control messages (where acontrol channel is supported)

[1333] Dead nozzle information from the ISISlave SoPEC.

[1334] Encrypted SoPEC_id for use in the generation of PRINTER_QA keysduring factory programming

[1335] 12.1.2.2 Communication with External USB Device

[1336] 12.1.2.2.1 ISIMaster to External USB Device Communication

[1337] Communications channel configuration and maintenance information.

[1338] 12.1.2.2.2 External USB Device to ISIMaster Communication

[1339] Print data from a function on the external USB device.

[1340] 12.1.2.3 Communication Over ISI

[1341] 12.1.2.3.1 ISIMaster to PrintMaster Communication

[1342] The ISIMaster and PrintMaster will often be the same physicaldevice. When they are different devices then the following informationneeds to be exchanged over the ISI:

[1343] All data from the external host destined for the PrintMaster (seesection 12.1.2.1.4).

[1344] This data is simply relayed by the ISIMaster

[1345] 12.1.2.3.2 PrintMaster to ISIMaster Communication

[1346] The ISIMaster and PrintMaster will often be the same physicaldevice. When they are different devices then the following informationneeds to be exchanged over the ISI:

[1347] All data from the PrintMaster destined for the external host (seesection 12.1.2.1.4).

[1348] This data is simply relayed by the ISIMaster

[1349] 12.1.2.3.3 ISIMaster to ISISlave Communication

[1350] The ISIMaster may wish to communicate the following informationto the ISISlaves:

[1351] All data (including program code such as ISIId enumeration)originating from the external host and destined for the ISISlave (seesection 12.1.2.1.5). This data is simply relayed by the ISIMaster

[1352] wake up from sleep mode

[1353] 12.1.2.3.4 ISISlave to ISIMaster Communication

[1354] The ISISlave may wish to communicate the following information tothe ISIMaster:

[1355] All data originating from the ISISlave and destined for theexternal host (see section 12.1.2.1.6). This data is simply relayed bythe ISIMaster

[1356] 12.1.2.3.5 PrintMaster to ISISlave Communication

[1357] When the PrintMaster is not the ISIMaster all ISI communicationis done in response to ISI ping packets (see 12.4.4.6). When thePrintMaster is the ISIMaster then it will of course communicate directlywith the ISISlaves. The PrintMaster SoPEC may wish to communicate thefollowing information to the ISISlaves:

[1358] Ink status e.g. requests for dotCount data i.e. the number ofdots in each color fired by the printheads connected to the ISISlaves

[1359] configuration of GPIO ports e.g. for clutch control and lid opendetect

[1360] power down command telling the ISISlave to enter sleep mode

[1361] ink cartridge fail information

[1362] This list is not complete and the time constraints associatedwith these requirements have yet to be determined.

[1363] In general the PrintMaster may need to be able to:

[1364] send messages to an ISISlave which will cause the ISISlave toreturn the contents of ISISlave registers to the PrintMaster or

[1365] to program ISISlave registers with values sent by the PrintMaster

[1366] This should be under the control of software running on the CPUwhich writes messages to the ISI/SCB interface.

[1367] 12.1.2.3.6 ISISlave to PrintMaster Communication

[1368] ISISlaves may need to communicate the following information tothe PrintMaster:

[1369] ink status e.g. dotCount data i.e. the number of dots in eachcolor fired by the printheads connected to the ISISlaves

[1370] band related information e.g. finished band interrupts

[1371] page related information i.e. buffer underrun, page finishedinterrupts

[1372] MMU security violation interrupts

[1373] GPIO interrupts and status e.g. clutch control and lid opendetect

[1374] printhead temperature

[1375] printhead dead nozzle information from SoPEC printhead nozzletests

[1376] power management status

[1377] This list is not complete and the time constraints associatedwith these requirements have yet to be determined.

[1378] As the ISI is an insecure interface commands issued over the ISIshould be of limited capability e.g. only limited register writesallowed. The software protocol needs to be constructed with this inmind. In general ISISlaves may need to return register or statusmessages to the PrintMaster or ISIMaster. They may also need to indicateto the PrintMaster or ISIMaster that a particular interrupt has occurredon the ISISlave. This should be under the control of software running onthe CPU which writes messages to the ISI block.

[1379] 12.1.2.3.7 ISISlave to ISISlave Communication

[1380] The amount of information that will need to be communicatedbetween ISISlaves will vary considerably depending on the printerconfiguration. In some systems ISISlave devices will only need toexchange small amounts of control information with each other while inother systems (such as those employing a storage SoPEC or extra USBconnection) large amounts of compressed page data may be moved betweenISISlaves. ScenarIOs where ISISlave to ISISlave communication isrequired include: (a) when the PrintMaster is not the ISIMaster, (b) QAChip ink usage protocols, (c) data transmission from data storageSoPECs, (d) when there are multiple external host connections supplyingdata to the printer.

[1381] 12.1.3 SCB Block Diagram

[1382] The SCB consists of four main sub-blocks, as shown in the basicblock diagram of FIG. 28.

[1383] 12.1.4 Definitions of I/Os

[1384] The toplevel I/Os of the SCB are listed in Table 32. A moredetailed description of their functionality will be given in therelevant sub-block sections. TABLE 32 SCB I/O Port name s I/ODescription Clocks and Resets prst_n 1 In System reset signal. Activelow. Pclk 1 In System clock. usbclk 1 In 48 MHz clock for the USB deviceand host cores. The cores also require a 12 MHz clock, which will begenerated locally by dividing the 48 MHz clock by 4. isi_cpr_reset_n 1Out Signal from the ISI indicating that ISI activity has been detectedwhile in sleep mode and so the chip should be reset. Active low.usbd_cpr_reset_n 1 Out Signal from the USB device that a USB reset hasoccurred. Active low. USB device IO transceiver signals usbd_ts 1 OutUSB device IO transceiver (BUSB2_PM) driver three-state control. Activehigh enable. usbd_a 1 Out USB device IO transceiver (BUSB2_PM) driverdata input. usbd_se0 1 Out USB device IO transceiver (BUSB2_PM)single-ended zero input. Active high. usbd_zp 1 In USB device IOtransceiver (BUSB2_PM) D+ receiver output. usbd_zm 1 In USB device IOtransceiver (BUSB2_PM) D− receiver output. usbd_z 1 In USB device IOtransceiver (BUSB2_PM) differential receiver output. usbd_pull_up_en 1Out USB device pull-up resistor enable. Switches power to the externalpull- up resistor, connected to the D+ line that is required for deviceidentification to the USB. Active high. usbd_vbus_sense 1 In USB deviceVBUS power sense. Used to detect power on VBUS. NOTE: The IBM Cu11 PADSare 3.3 V, VBUS is 5 V. An external voltage conversion will benecessary, e.g. resistor divider network. Active high. USB host IOtransceiver signals usbh_ts 1 Out USB host IO transceiver (BUSB2_PM)driver three-state control. Active high enable usbh_a 1 Out USB host IOtransceiver (BUSB2_PM) driver data input. usbh_se0 1 Out USB host IOtransceiver (BUSB2_PM) single- ended zero input. Active high. usbh_zp 1In USB host IO transceiver (BUSB2_PM) D+ receiver output. usbh_zm 1 InUSB host IO transceiver (BUSB2_PM) D− receiver output. usbh_z 1 In USBhost IO transceiver (BUSB2_PM) differential receiver output.usbh_over_current 1 In USB host port power over current indicator.Active high. usbh_power_en 1 Out USB host VBUS power enable. Used forport power switching. Active high. CPU Interface cpu_adr[n:2] n-1 In CPUaddress bus. cpu_dataout[31:0] 32 In Shared write data bus from the CPUscb_cpu_data[31:0] 32 Out Read data bus to the CPU cpu_rwn 1 In Commonread/not-write signal from the CPU cpu_acode[1:0] 2 In CPU Access Codesignals. These decode as follows: 00 - User program access 01 - Userdata access 10 - Supervisor program access 11 - Supervisor data accesscpu_scb_sel 1 In Block select from the CPU. When cpu_scb_sel is highboth cpu_adr and cpu_dataout are valid scb_cpu_rdy 1 Out Ready signal tothe CPU. When scb_cpu_rdy is high it indicates the last cycle of theaccess. For a write cycle this means cpu_dataout has been registered bythe SCB and for a read cycle this means the data on scb_cpu_data isvalid. scb_cpu_berr 1 Out Bus error signal to the CPU indicating aninvalid access. scb_cpu_debug_(—) 1 Out Signal indicating that the datavalid currently on scb_cpu_data is valid debug data Interrupt signalsdma_icu_irq 1 Out DMA interrupt signal to the interrupt controllerblock. isi_icu_irq 1 Out ISI interrupt signal to the interruptcontroller block. usb_icu_irq[1:0] 2 Out USB host and device interruptsignals to the ICU. Bit 0 - USB Host interrupt Bit 1 - USB Deviceinterrupt DIU interface scb_diu_wadr[21:5] 17 Out Write address bus tothe DIU scb_diu_data[63:0] 64 Out Data bus to the DIU. scb_diu_wreq 1Out Write request to the DIU diu_scb_wack 1 In Acknowledge from the DIUthat the write request was accepted. scb_diu_wvalid 1 Out Signal fromthe SCB to the DIU indicating that the data currently on thescb_diu_data[63:0] bus is valid scb_diu_wmask[7:0] 7 Out Byte alignedwrite mask. A “1” in a bit field of “scb_diu_wmask[7:0]” means that thecorresponding byte will be written to DRAM. scb_diu_rreq 1 Out Readrequest to the DIU. scb_diu_radr[21:5] 17 Out Read address bus to theDIU diu_scb_rack 1 In Acknowledge from the DIU that the read request wasaccepted. diu_scb_rvalid 1 In Signal from the DIU to the SCB indicatingthat the data currently on the diu_data[63:0] bus is validdiu_data[63:0] 64 In Common DIU data bus. GPIO interfaceisi_gpio_dout[3:0] 4 Out ISI output data to GPIO pins isi_gpio_e[3:0] 4Out ISI output enable to GPIO pins gpio_isi_din[3:0] 4 In Input datafrom GPIO pins to ISI

[1385] 12.1.5 SCB Data Flow

[1386] A logical view of the SCB is shown in FIG. 29, depicting thetransfer of data within the SCB.

[1387] 12.2 USBD (USB Device Sub-Block)

[1388] 12.2.1 Overview

[1389] The FS USB device controller core and associated SCB logic arereferred to as the USB Device (USBD).

[1390] A SoPEC printer has FS USB device capability to facilitatecommunication between an external USB host and a SoPEC printer. The USBDis self-powered. It connects to an external USB host via a dedicated USBinterface on the SoPEC printer, comprising a USB connector, thenecessary discretes for USB signalling and the associated SoPEC ASICI/Os.

[1391] The FS USB device core will be third party IP from Synopsys:TymeWare™ USB1.1 Device Controller (UDCVCI). Refer to the UDCVCI UserManual [20] for a description of the core.

[1392] The device core does not support LS USB operation. Control andbulk transfers are supported by the device. Interrupt transfers are notconsidered necessary because the required interrupt-type functionalitycan be achieved by sending query messages over the control channel on ascheduled basis. There is no requirement to support isochronoustransfers.

[1393] The device core is configured to support 6 USB endpoints (EPs):the default control EP (EP0), 4 bulk OUT EPs (EP1, EP2, EP3, EP4) and 1bulk IN EP (EP5). It should be noted that the direction of each EP iswith respect to the USB host, i.e. IN refers to data transferred to theexternal host and OUT refers to data transferred from the external host.The 4 bulk OUT EPs will be used for the transfer of data from theexternal host to SoPEC, e.g. compressed page data, program data orcontrol messages. Each bulk OUT EP can be mapped on to any targetdestination in a multi-SoPEC system, via the SCB Map configurationregisters. The bulk IN EP is used for the transfer of data from SoPEC tothe external host, e.g. a print image downloaded from a digital camerathat requires processing on the external host system. Any feedback datawill be returned to the external host on EP0, e.g. status information.

[1394] The device core does not provide internal buffering for any ofits EPs (with the exception of the 8 byte setup data payload for controltransfers). All EP buffers are provided in the SCB. Buffers will begrouped according to EP direction and associated packet destination. TheSCB Map configuration registers contain a DestISIId and DestISISubId foreach OUT EP, defining their EP mapping and therefore their packetdestination. Refer to section Section 12.4 ISI (Inter SoPEC InterfaceSub-block) for further details on ISIId and ISISubId. Refer to sectionSection 12.5 CTRL (Control Sub-block) for further details on the mappingof OUT EPs.

[1395] 12.2.2 USBD Effective Bandwidth

[1396] The effective bandwidth between an external USB host and theprinter will be influenced by:

[1397] Amount of activity from other devices that share the USB with theprinter.

[1398] Throughput of the device controller core.

[1399] EP buffering implementation.

[1400] Responsiveness of the external host system CPU in handling USBinterrupts.

[1401] To maximize bandwidth to the printer it is recommended that noother devices are active on the USB between the printer and the externalhost. If the printer is connected to a HS USB external host or hub itmay limit the bandwidth available to other devices connected to the samehub but it would not significantly affect the bandwidth available toother devices upstream of the hub. The EP buffering should not limit theUSB device core throughput, under normal operating conditions. Used inthe recommended configuration, under ideal operating conditions, it isexpected that an effective bandwidth of 8-9 Mbit/s will be achieved withbulk transfers between the external host and the printer.

[1402] 12.2.3 IN EP Packet Buffer

[1403] The IN EP packet buffer stores packets originating from the LEONCPU that are destined for transmission over the USB to the external USBhost. CPU writes to the buffer are 32 bits wide. USB device core readsfrom the buffer 32 bits wide.

[1404] 128 bytes of local memory are required in total for EP0-IN andEP5-IN buffering. The IN EP buffer is a single, 2-port local memoryinstance, with a dedicated read port and a dedicated write port. Bothports are 32 bits wide. Each IN EP has a dedicated 64 byte packetlocation available in the memory array to buffer a single USB packet(maximum USB packet size is 64 bytes). Each individual 64 byte packetlocation is structured as 16×32 bit words and is read/written in a FIFOmanner. When the device core reads a packet entry from the IN EP packetbuffer, the buffer must retain the packet until the device core performsa status write, informing the SCB that the packet has been accepted bythe external USB host and can be flushed. The CPU can therefore onlywrite a single packet at a time to each IN EP. Any subsequent CPU writerequest to a buffer location containing a valid packet will be refused,until that packet has been successfully transmitted.

[1405] 12.2.4 OUT EP Packet Buffer

[1406] The OUT EP packet buffer stores packets originating from theexternal USB host that are destined for transmission over DMAChannel0,DMAChannel1 or the ISI. The SCB control logic is responsible for routingthe OUT EP packets from the OUT EP packet buffer to DMA or to the ISITxBuffer, based on the SCB Map configuration register settings. USB corewrites to the buffer are 32 bits wide. DMA and ISI associated reads fromthe buffer are both 64 bits wide.

[1407] 512 bytes of local memory are required in total for EP0-OUT,EP1-OUT, EP2-OUT, EP3-OUT and EP4-OUT buffering. The OUT EP packetbuffer is a single, 2-port local memory instance, with a dedicated readport and a dedicated write port. Both ports are 64 bits wide. Byteenables are used for the 32 bit wide USB device core writes to thebuffer. Each OUT EP can be mapped to DMAChannel0, DMAChannel1 or theISI.

[1408] The OUT EP packet buffer is partitioned accordingly, resulting inthree distinct packet FIFOs:

[1409] USBDDMA0FIFO, for USB packets destined for DMAChannel0 on thelocal SoPEC.

[1410] USBDDMA1 FIFO, for USB packets destined for DMAChannel1 on thelocal SoPEC.

[1411] USBDISIFIFO, for USB packets destined for transmission over theISI.

[1412] 12.2.4.1 USBDDMAnFIFO

[1413] This description applies to USBDDMA0FIFO and USBDDMA1 FIFO, where‘n’ represents the respective DMA channel, i.e. n=0 for USBDDMA0FIFO,n=1 for USBDDMA1 FIFO. USBDDMAnFIFO services any EPs mapped toDMAChanneln on the local SoPEC device. This implies that a packetoriginating from an EP with an associated ISIId that matches the localSoPEC ISIId and an ISISubId=n will be written to USBDDMAnFIFO, if thereis space available for that packet.

[1414] USBDDMAnFIFO has a capacity of 2×64 byte packet entries, and cantherefore buffer up to 2 USB packets. It can be considered as a 2 packetentry FIFO. Packets will be read from it in the same order in which theywere written, i.e. the first packet written will be the first packetread and the second packet written will be the second packet read. Eachindividual 64 byte packet location is structured as 8×64 bit words andis read/written in a FIFO manner.

[1415] The USBDDMAnFIFO has a write granularity of 64 bytes, to allowfor the maximum USB packet size. The USBDDMAnFIFO will have a readgranularity of 32 bytes to allow for the DMA write access bursts of 4×64bit words, i.e. the DMA Manager will read 32 byte chunks at a time fromthe USBDDMAnFIFO 64 byte packet entries, for transfer to the DIU.

[1416] It is conceivable that a packet which is not a multiple 32 bytesin size may be written to the USBDDMAnFIFO. When this event occurs, theDMA Manager will read the contents of the remaining address locationsassociated with the 32 byte chunk in the USBDDMAnFIFO, transferring thepacket plus whatever data is present in those locations, resulting in a32 byte packet (a burst of 4×64 bit words) transfer to the DIU.

[1417] The DMA channels should achieve an effective bandwidth of 160Mbits/sec (1 bit/cycle) and should never become blocked, under normaloperating conditions. As the USB bandwidth is considerably less, a 2entry packet FIFO for each DMA channel should be sufficient.

[1418] 12.2.4.2 USBDISIFIFO

[1419] USBDISIFIFO services any EPs mapped to ISI. This implies that apacket originating from an EP with an associated ISId that does notmatch the local SoPEC ISId will be written to USBDISIFIFO if there isspace available for that packet.

[1420] USBDISIFIFO has a capacity of 4×64 byte packet entries, and cantherefore buffer up to 4 USB packets. It can be considered as a 4 packetentry FIFO. Packets will be read from it in the same order in which theywere written, i.e. the first packet written will be the first packetread and the second packet written will be the second packet read, etc.Each individual 64 byte packet location is structured as 8×64 bit wordsand is read/written in a FIFO manner.

[1421] The ISI long packet format will be used to transfer data acrossthe ISI. Each ISI long packet data payload is 32 bytes. The USBDISIFIFOhas a write granularity of 64 bytes, to allow for the maximum USB packetsize. The USBDISIFIFO will have a read granularity of 32 bytes to allowfor the ISI packet size, i.e. the SCB will read 32 byte chunks at a timefrom the USBDISIFIFO 64 byte packet entries, for transfer to the ISI.

[1422] It is conceivable that a packet which is not a multiple 32 bytesin size may be written to the USBDISIFIFO, either intentionally or dueto a software error. A maskable interrupt per EP is provided to flagthis event. There will be 2 options for dealing with this scenario on aper EP basis:

[1423] Discard the packet.

[1424] Read the contents of the remaining address locations associatedwith the 32 byte chunk in the USBDISIFIFO, transferring the irregularsize packet plus whatever data is present in those locations, resultingin a 32 byte packet transfer to the ISITxBuffer.

[1425] The ISI should achieve an effective bandwidth of 100 Mbits/sec (4wire configuration). It is possible to encounter a number of retrieswhen transmitting an ISI packet and the LEON CPU will require access tothe ISI transmit buffer. However, considering the relatively lowbandwidth of the USB, a 4 packet entry FIFO should be sufficient.

[1426] 12.2.5 Wake-Up From Sleep Mode

[1427] The SoPEC will be placed in sleep mode after a suspend command isreceived by the USB device core. The USB device core will continue to bepowered and clocked in sleep mode. A USB reset, as opposed to a deviceresume, will be required to bring SoPEC out of its sleep state as thesleep state is hoped to be logically equivalent to the power down state.

[1428] The USB reset signal originating from the USB controller will bepropagated to the CPR (as usb_cpr_reset_n) if the USBWakeupEnable bit ofthe WakeupEnable register (see Table) has been set. The USBWakeupEnablebit should therefore be set just prior to entering sleep mode. There isa scenario that would require SoPEC to initiate a USB remote wake-up(i.e. where SoPEC signals resume to the external USB host after beingsuspended by the external USB host). A digital camera (or othersupported external USB device) could be connected to SoPEC via theinternal SoPEC USB host controller core interface. There may be a needto transfer data from this external USB device, via SoPEC, to theexternal USB host system for processing. If the USB connecting theexternal host system and SoPEC was suspended, then SoPEC would need toinitiate a USB remote wake-up.

[1429] 12.2.6 Implementation

[1430] 12.2.6.1 USBD Sub-Block Partition

[1431] Block diagram

[1432] Definition of I/Os

[1433] 12.2.6.2 USB Device IP Core

[1434] 12.2.6.3 PVCI Target

[1435] 12.2.6.4 IN EP Buffer

[1436] 12.2.6.5 OUT EP Buffer

[1437] 12.3 USBH (USB Host Sub-Block)

[1438] 12.3.1 Overview

[1439] The SoPEC USB Host Controller (HC) core, associated SCB logic andassociated SoPEC ASIC I/Os are referred to as the USB Host (USBH).

[1440] A SoPEC printer has FS USB host capability, to facilitatecommunication between an external USB device and a SoPEC printer. TheUSBH connects to an external USB device via a dedicated USB interface onthe SoPEC printer, comprising a USB connector, the necessary discretesfor USB signalling and the associated SoPEC ASIC I/Os.

[1441] The FS USB HC core are third party IP from Synopsys:DesignWare^(R) USB1.1 OHCI Host Controller with PVCI (UHOSTC_PVCI).Refer to the UHOSTC_PVCI User Manual [18] for details of the core. Referto the Open Host Controller Interface (OHCI) Specification Release [19]for details of OHCI operation.

[1442] The HC core supports Low-Speed (LS) USB devices, althoughcompatible external USB devices are most likely to be FS devices. It isexpected that communication between an external USB device and a SoPECprinter will be achieved with control and bulk transfers. However,isochronous and interrupt transfers are also supported by the HC core.

[1443] There will be 2 communication channels between the HostController Driver (HCD) software running on the LEON CPU and the HCcore:

[1444] OHCI operational registers in the HC core. These registers arecontrol, status, list pointers and a pointer to the Host ControllerCommunications Area (HCCA) in shared memory. A target Peripheral VirtualComponent Interface (PCVI) on the HC core will provide LEON with directread/write access to the operational registers. Refer to the OHCISpecification for details of these registers.

[1445] HCCA in SoPEC eDRAM. An initiator Peripheral Virtual ComponentInterface (PCVI) on the HC core will provide the HC with DMA read/writeaccess to an address space in eDRAM. The HCD running on LEON will haveread/write access to the same address space. Refer to the OHCISpecification for details of the HCCA.

[1446] The target PVCI interface is a 32 bit word aligned interface,with byte enables for write access. All read/write access to the targetPVCI interface by the LEON CPU will be 32 bit word aligned. The byteenables will not be used, as all registers will be read and written as32 bit words.

[1447] The initiator PVCI interface is a 32 bit word aligned interfacewith byte enables for write access. All DMA read/write accesses are 256bit word aligned, in bursts of 4×64 bit words. As there is no guaranteethat the read/write requests from the HC core will start at a 256 bitboundary or be 256 bits long, it is necessary to provide 8 byte enablesfor each of the 64 bit words in a write burst form the HC core to DMA.The signal scb_diu_wmask serves this purpose.

[1448] Configuration of the HC core will be performed by the HCD.

[1449] 12.3.2 Read/Write Buffering

[1450] The HC core maximum burst size for a read/write access is 4×32bit words. This implies that the minimum buffering requirements for theHC core will be a 1 entry deep address register and a 4 entry deep dataregister. It will be necessary to provide data and address mappingfunctionality to convert the 4×32 bit word HC core read/write burstsinto 4×64 bit word DMA read/write bursts. This will meet the minimumbuffering requirements.

[1451] 12.3.3 USBH Effective Bandwidth

[1452] The effective bandwidth between an external USB device and aSoPEC printer will be influenced by:

[1453] Amount of activity from other devices that share the USB with theexternal USB device.

[1454] Throughput of the HC core.

[1455] HC read/write buffering implementation.

[1456] Responsiveness of the LEON CPU in handling USB interrupts.

[1457] Effective bandwidth between an external USB device and a SoPECprinter is not an issue. The primary application of this connectivity isthe download of a print image from a digital camera. Printing speed isnot important for this type of print operation. However, to maximizebandwidth to the printer it is recommended that no other devices areactive on the USB between the printer and the external USB device. TheHC read/write buffering in the SCB should not limit the USB HC corethroughput, under normal operating conditions.

[1458] Used in the recommended configuration, under ideal operatingconditions, it is expected that an effective bandwidth of 8-9 Mbit/swill be achieved with bulk transfers between the external USB device andthe SoPEC printer.

[1459] 12.3.4 Implementation

[1460] 12.3.5 USBH Sub-Block Partition

[1461] USBH Block Diagram

[1462] Definition of I/Os.

[1463] 12.3.5.1 USB Host IP Core

[1464] 12.3.5.2 PVCI Target

[1465] 12.3.5.3 PVCI Initiator

[1466] 12.3.5.4 Read/Write Buffer

[1467] 12.4 ISI (Inter SoPEC Interface Sub-Block)

[1468] 12.4.1 Overview

[1469] The ISI is utilised in all system configurations requiring morethan one SoPEC. An example of such a system which requires four SoPECsfor duplex A3 printing and an additional SoPEC used as a storage deviceis shown in FIG. 27.

[1470] The ISI performs much the same function between an ISISlave SoPECand the ISIMaster as the USB connection performs between the ISIMasterand the external host. This includes the transfer of all program data,compressed page data and message (i.e. commands or status information)passing between the ISIMaster and the ISISlave SoPECs. The ISIMasterinitiates all communication with the ISISlaves.

[1471] 12.4.2 ISI Effective Bandwidth

[1472] The ISI will need to run at a speed that will allow error freetransmission on the PCB while minimising the buffering and hardwarerequirements on SoPEC. While an ISI speed of 10 Mbit/s is adequate tomatch the effective FS USB bandwidth it would limit the systemperformance when a high-speed connection (e.g. USB2.0, IEEE1394) is usedto attach the printer to the PC. Although they would require the use ofan extra ISI-Bridge chip such systems are envisaged for more expensiveprinters (compared to the low-cost basic SoPEC powered printers that areinitially being targeted) in the future.

[1473] An ISI line speed (i.e. the speed of each individual ISI wire) of32 Mbit/s is therefore proposed as it will allow ISI data to beover-sampled 5 times (at a pclk frequency of 160 MHz). The totalbandwidth of the ISI will depend on the number of pins used to implementthe interface. The ISI protocol will work equally well if 2 or 4 pinsare used for transmission/reception. The ISINumPins register is used toselect between a 2 or 4 wire ISI, giving peak raw bandwidths of 64Mbit/s and 128 Mbit/s respectively. Using either a 2 or 4 wire ISIsolution would allow the movement of data in to and out of a storageSoPEC (as described in 12.1.1.4 above), which is the most bandwidthhungry ISI use, in a timely fashion.

[1474] The ISINumPins register is used to select between a 2 or 4 wireISI. A 2 wire ISI is the default setting for ISINumPins and this may bechanged to a 4 wire ISI after initial communication has been establishedbetween the ISIMaster and all ISISlaves. Software needs to ensure thatthe switch from 2 to 4 wires is handled in a controlled and coordinatedfashion so that nothing is transmitted on the ISI during the switch overperiod.

[1475] The maximum effective bandwidth of a two wire ISI, after allowingfor protocol overheads and bus turnaround times, is expected to beapprox. 50 Mbit/s.

[1476] 12.4.3 ISI Device Identification and Enumeration

[1477] The ISIMasterSel bit of the ISICntrl register (see section Table)determines whether a SoPEC is an ISIMaster (ISIMasterSel=1), or anISISlave (ISIMasterSel=0).

[1478] SoPEC defaults to being an ISISlave (ISIMasterSel=0) after apower-on reset—i.e. it will not transmit data on the ISI without firstreceiving a ping. If a SoPEC's ISIMasterSel bit is changed to 1, thenthat SoPEC will become the ISIMaster, transmitting data withoutrequiring a ping, and generating pings as appropriately programmed.

[1479] ISIMasterSel can be set to 1 explicitly by the CPU writingdirectly to the ISICntrl register. ISIMasterSel can also beautomatically set to 1 when activity occurs on any of USB endpoints 2-4and the AutoMasterEnable bit of the ISICntrl register is also 1 (thedefault reset condition). Note that if AutoMasterEnable is 0, thenactivity on USB endpoints 2-4 will not result in ISIMasterSel being setto 1. USB endpoints 2-4 are chosen for the automatic detection since thepower-on-reset condition has USB endpoints 0 and 1 pointing to ISIId 0(which matches the local SoPEC's ISIId after power-on reset). Thus anytransmission on USB endpoints 2-4 indicate a desire to transmit on theISI which would usually indicate ISIMaster status. The automatic settingof ISIMasterSel can be disabled by clearing AutoMasterEnable, therebyallowing the SoPEC to remain an ISISlave while still making use of theUSB endpoints 2-4 as external destinations.

[1480] Thus the setting of a SoPEC being ISIMaster or ISISlave can becompletely under software control, or can be completely automatic.

[1481] The ISIId is established by software downloaded over the ISI (inbroadcast mode) which looks at the input levels on a number of GPIO pinsto determine the ISIId. For any given printer that uses a multi-SoPECconfiguration it is expected that there will always be enough free GPIOpins on the ISISlaves to support this enumeration mechanism.

[1482] 12.4.4 ISI Protocol

[1483] The ISI is a serial interface utilizing a 2/4 wire half-duplexconfiguration such as the 2-wire system shown in FIG. 30 below. AnISIMaster must always be present and a variable number of ISISlaves mayalso be on the ISI bus. The ISI protocol supports up to 14 addressableslaves, however to simplify electrical issues the ISI drivers need onlyallow for 5-6 ISI devices on a particular ISI bus. The ISI bus enablesbroadcasting of data, ISIMaster to ISISlave communication, ISISlave toISIMaster communication and ISISlave to ISISlave communication. Flowcontrol, error detection and retransmission of errored packets is alsosupported. ISI transmission is asynchronous and a Start field is presentin every transmitted packet to ensure synchronization for the durationof the packet.

[1484] To maximize the effective ISI bandwidth while minimising pinrequirements a half-duplex interleaved transmission scheme is used. FIG.31 below shows how a 16-bit word is transmitted from an ISIMaster to anISISlave over a 2-wire ISI bus. Since data will be interleaved over thewires and a 4-wire ISI is also supported, all ISI packets should be amultiple of 4 bits.

[1485] All ISI transactions are initiated by the ISIMaster and everynon-broadcast data packet needs to be acknowledged by the addressedrecipient. An ISISlave may only transmit when it receives a ping packet(see section 12.4.4.6) addressed to it. To avoid bus contention all ISIdevices must wait ISITurnAround bit-times (5 pclk cycles per bit) afterdetecting the end of a packet before transmitting a packet (assumingthey are required to transmit). All non-transmitting ISI devices musttristate their Tx drivers to avoid line contention. The ISI protocol isdefined to avoid devices driving out of order (e.g. when an ISISlave isno longer being addressed). As the ISI uses standard I/O pads there isno physical collision detection mechanism.

[1486] There are three types of ISI packet: a long packet (used for datatransmission), a ping packet (used by the ISIMaster to prompt ISISlavesfor packets) and a short packet (used to acknowledge receipt of apacket). All ISI packets are delineated by a Start and Stop fields andtransmission is atomic i.e. an ISI packet may not be split or haltedonce transmission has started.

[1487] 12.4.4.1 ISI Transactions

[1488] The different types of ISI transactions are outlined in FIG. 32below. As described later all NAKs are inferred and ACK_(S) are notaddressed to any particular ISI device.

[1489] 12.4.4.2 Start Field Description

[1490] The Start field serves two purposes: To allow the start of apacket be unambiguously identified and to allow the receiving devicesynchronise to the data stream. The symbol, or data value, used toidentify a Start field must not legitimately occur in the ensuingpacket. Bit stuffing is used to guarantee that the Start symbol will beunique in any valid (i.e. error free) packet. The ISI needs to see avalid Start symbol before packet reception can commence i.e. the receivelogic constantly looks for a Start symbol in the incoming data and willreject all data until it sees a Start symbol. Furthermore if a Startsymbol occurs (incorrectly) during a data packet it will be treated asthe start of a new packet. In this case the partially received packetwill be discarded.

[1491] The data value of the Start symbol should guarantee that anadequate number of transitions occur on the physical ISI lines to allowthe receiving ISI device to determine the best sampling window for thetransmitted data. The Start symbol should also be sufficiently long toensure that the bit stuffing overhead is low but should still be shortenough to reduce its own contribution to the packet overhead. A Startsymbol of b01010101 is therefore used as it is an effective compromisebetween these constraints.

[1492] Each SoPEC in a multi-SoPEC system will derive its system clockfrom a unique (i.e. one per SoPEC) crystal. The system clocks of eachdevice will drift relative to each other over any period of time. Thesystem clocks are used for generation and sampling of the ISI data.Therefore the sampling window can drift and could result in incorrectdata values being sampled at a later point in time. To overcome thisproblem the ISI receive circuitry tracks the sampling window against theincoming data to ensure that the data is sampled in the centre of thebit period.

[1493] 12.4.4.3 Stop Field Description

[1494] A 1 bit-time Stop field of b1 per ISI line ensures that all ISIlines return to the high state before the next packet is transmitted.The stop field is driven on to each ISI line simultaneously, i.e. b11for a 2-wire ISI and b1111 for a 4-wire ISI would be interleaved overthe respective ISI lines. Each ISI line is driven high for 1 bit-time.This is necessary because the first bit of the Start field is b0.

[1495] 12.4.4.4 Bit Stuffing

[1496] This involves the insertion of bits into the bitstream at thetransmitting SoPEC to avoid certain data patterns. The receiving SoPECwill strip these inserted bits from the bitstream.

[1497] Bit-stuffing will be performed when the Start symbol appears at alocation other than the start field of any packet, i.e. when the bitpattern b0101010 occurs at the transmitter, a 0 will be inserted toescape the Start symbol, resulting in the bit pattern b01010100.Conversely, when the bit pattern b0101010 occurs at the receiver, if thenext bit is a ‘0’ it will be stripped, if it is a ‘1’ then a Startsymbol is detected.

[1498] If the frequency variations in the quartz crystal were largeenough, it is conceivable that the resultant frequency drift over alarge number of consecutive 1s or 0s could cause the receiving SoPEC toloose synchronisation.⁶ The quartz crystal that will be used in SoPECsystems is rated for 32 MHz @ 100 ppm. In a multi-SoPEC system with a 32MHz+100 ppm crystal and a 32 MHz-100 ppm crystal, it would takeapproximately 5000 pclk cycles to cause a drift of 1 pclk cycle. Thismeans that we would only need to bit-stuff somewhere before 1000 ISIbits of consecutive 1s or consecutive 0s, to ensure adequatesynchronization. As the maximum number of bits transmitted per ISI linein a packet is 145, it should not be necessary to perform bit-stuffingfor consecutive 1s or 0s. We may wish to constrain the spec of xtalinand also xtalin for the ISI-Bridge chip to ensure the ISI cannot driftout of sync during packet reception.

[1499] Note that any violation of bit stuffing will result in theRxFrameErrorSticky status bit being set and the incoming packet will betreated as an errored packet.

[1500] 12.4.4.5 ISI Long Packet

[1501] The format of a long ISI packet is shown in FIG. 33 below. Datamay only be transferred between ISI devices using a long packet as boththe short and ping packets have no payload field. Except in the case ofa broadcast packet, the receiving ISI device will always reply to a longpacket with an explicit ACK (if no error is detected in the receivedpacket) or will not reply at all (e.g. an error is detected in thereceived packet), leaving the transmitter to infer a NAK. As with allISI packets the bitstream of a long packet is transmitted with its lsb(the leftmost bit in FIG. 33) first. Note that the total length (inbits) of an ISI long packet differs slightly between a 2 and 4-wire ISIsystem due to the different number of bits required for the Start andStop fields.

[1502] All long packets begin with the Start field as described earlier.The PktDesc field is described in Table 33. TABLE 33 PktDesc fielddescription Bit Description 0:1 00 - Long packet 01 - Reserved 10 - Pingpacket 11 - Reserved 2 Sequence bit value. Only valid for long packets.See section 12.4.4.9 for a description of sequence bit operation

[1503] Any ISI device in the system may transmit a long packet but onlythe ISIMaster may initiate an ISI transaction using a long packet. AnISISlave may only send a long packet in reply to a ping message from theISIMaster. A long packet from an ISISlave may be addressed to any ISIdevice in the system.

[1504] The Address field is straightforward and complies with the ISInaming convention described in section 12.5.

[1505] The payload field is exactly what is in the transmit buffer ofthe transmitting ISI device and gets copied into the receive buffer ofthe addressed ISI device(s). When present the payload field is always256 bits.

[1506] To ensure strong error detection a 16-bit CRC is appended.

[1507] 12.4.4.6 ISI Ping Packet

[1508] The ISI ping packet is used to allow ISISlaves to transmit on theISI bus. As can be seen from FIG. 34 below the ping packet can be viewedas a special case of the long packet. In other words it is a long packetwithout any payload. Therefore the PktDesc field is the same as a longpacket PktDesc, with the exception of the sequence bit, which is notvalid for a ping packet. Both the ISISubId and the sequence bit arefixed at 1 for all ping packets. These values were chosen to maximizethe hamming distance from an ACK symbol and to minimize the likelihoodof bit stuffing. The ISISubId is unused in ping packets because theISIMaster is addressing the ISI device rather than one of the DMAchannels in the device. The ISISlave may address any ISIId.ISISubId inresponse if it wishes. The ISISlave will respond to a ping packet witheither an explicit ACK (if it has nothing to send), an inferred NAK (ifit detected an error in the ping packet) or a long packet (containingthe data it wishes to send). Note that inferred NAK_(S) do not result inthe retransmission of a ping packet. This is because the ping packetwill be retransmitted on a predetermined schedule (see 12.4.4.11 formore details).

[1509] An ISISlave should never respond to a ping message to thebroadcast ISIId as this must have been sent in error. An ISI ping packetwill never be sent in response to any packet and may only originate froman ISIMaster.

[1510] 12.4.4.7 ISI Short Packet

[1511] The ISI short packet is only 17 bits long, including the Startand Stop fields. A value of b11101011 is proposed for the ACK symbol. Asa 16-bit CRC is inappropriate for such a short packet it is not used. Infact there is only one valid value for a short ACK packet as the Start,ACK and Stop symbols all have fixed values. Short packets are only usedfor acknowledgements (i.e. explicit ACKs). The format of a short ISIpacket is shown in FIG. 35 below. The ACK value is chosen to ensure thatno bit stuffing is required in the packet and to minimize its hammingdistance from ping and long ISI packets.

[1512] 12.4.4.8 Error Detection and Retransmission

[1513] The 16-bit CRC will provide a high degree of error detection andthe probability of transmission errors occurring is very low as thetransmission channel (i.e. PCB traces) will have a low inherent biterror rate. The number of undetected errors should therefore be minute.

[1514] The HDLC standard CRC-16 (i.e. G(x)=x¹⁶+x¹²+x⁵+1) is to be usedfor this calculation, which is to be performed serially. It iscalculated over the entire packet (excluding the Start and Stop fields).A simple retransmission mechanism frees the CPU from getting involved inerror recovery for most errors because the probability of a transmissionerror occurring more than once in succession is very, very low in normalcircumstances.

[1515] After each non-short ISI packet is transmitted the transmittingdevice will open a reply window. The size of the reply window will beISIShortReplyWin bit times when a short packet is expected in reply,i.e. the size of a short packet, allowing for worst case bit stuffing,bus turnarounds and timing differences. The size of the reply windowwill be ISILongReplyWin bit times when a long packet is expected inreply, i.e. this will be the max size of a long packet, allowing forworst case bit stuffing, bus turnarounds and timing differences. In bothcases if an ACK is received the window will close and another packet canbe transmitted but if an ACK is not received then the full length of thewindow must be waited out.

[1516] As no reply should be sent to a broadcast packet, no reply windowshould be required however all other long packets open a reply window inanticipation of an ACK. While the desire is to minimize the time betweenbroadcast transmissions the simplest solution should be employed. Thiswould imply the same size reply window as other long packets.

[1517] When a packet has been received without any errors the receivingISI device must transmit its acknowledge packet (which may be either along or short packet) before the reply window closes. When detectederrors do occur the receiving ISI device will not send any response. Thetransmitting ISI device interprets this lack of response as a NAKindicating that errors were detected in the transmitted packet or thatthe receiving device was unable to receive the packet for some reason(e.g. its buffers are full). If a long packet was transmitted thetransmitting ISI device will keep the transmitted packet in its transmitbuffer for retransmission. If the transmitting device is the ISIMasterit will retransmit the packet immediately while if the transmittingdevice is an ISISlave it will retransmit the packet in response to thenext ping it receives from the ISIMaster.

[1518] The transmitting ISI device will continue retransmitting thepacket when it receives a NAK until it either receives an ACK or thenumber of retransmission attempts equals the value of the NumRetriesregister. If the transmission was unsuccessful then the transmittingdevice sets the TxErrorSticky bit in its ISIIntStatus register. Thereceiving device also sets the RxErrorSticky bit in its ISIIntStatusregister whenever it detects a CRC error in an incoming packet and isnot required to take any further action, as it is up to the transmittingdevice to detect and rectify the problem. The NumRetries registers inall ISI devices should be set to the same value for consistentoperation. Note that successful transmission or reception of pingpackets do not affect retransmission operation.

[1519] Note that a transmit error will cause the ISI to stoptransmitting. CPU intervention will be required to resolve the source ofthe problem and to restart the ISI transmit operation. Receive errorshowever do not affect receive operation and they are collected tofacilitate problem debug and to monitor the quality of the ISI physicalchannel. Transmit or receive errors should be extremely rare and theiroccurrence will most likely indicate a serious problem.

[1520] Note that broadcast packets are never acknowledged to avoidcontention on the common ISI lines. If an ISISlave detects an error in abroadcast packet it should use the message passing mechanism describedearlier to alert the ISIMaster to the error if it so wishes.

[1521] 12.4.4.9 Sequence Bit Operation

[1522] To ensure that communication between transmitting and receivingISI devices is correctly ordered a sequence bit is included in everylong packet to keep both devices in step with each other. The sequencebit field is a constant for short or ping packets as they are not usedfor data transmission. In addition to the transmitted sequence bit allISI devices keep two local sequence bits, one for each ISISubId.Furthermore each ISI device maintains a transmit sequence bit for eachISIId and ISISubId it is in communication with. For packets sourced fromthe external host (via USB) the transmit sequence bit is contained inthe relevant USBEPnDest register while for packets sourced from the CPUthe transmit sequence bit is contained in the CPUISITxBuffCntrlregister. The sequence bits for received packets are stored inISISubId0Seq and ISISubId1Seq registers. All ISI devices will initializetheir sequence bits to 0 after reset. It is-the responsibility ofsoftware to ensure that the sequence bits of the transmitting andreceiving ISI devices are correctly initialized each time a new sourceis selected for any ISIId.ISISubId channel.

[1523] Sequence bits are ignored by the receiving ISI device forbroadcast packets. However the broadcasting ISI device is free to togglethe sequence in the broadcast packets since they will not affectoperation. The SCB will do this for all USB source data so that there isno special treatment for the sequence bit of a broadcast packet in thetransmitting device. CPU sourced broadcasts will have sequence bitstoggled at the discretion of the program code.

[1524] Each SoPEC may also ignore the sequence bit on either of itsISISubId channels by setting the appropriate bit in the ISISubIdSeqMaskregister. The sequence bit should be ignored for ISISubId channels thatwill carry data that can originate from more than one source and is selfordering e.g. control messages.

[1525] A receiving ISI device will toggle its sequence bit addressed bythe ISISubId only when the receiver is able to accept data and receivesan error-free data packet addressed to it. The transmitting ISI devicewill toggle its sequence bit for that ISIId.ISISubId channel only whenit receives a valid ACK handshake from the addressed ISI device.

[1526]FIG. 36 shows the transmission of two long packets with thesequence bit in both the transmitting and receiving devices togglingfrom 0 to 1 and back to 0 again. The toggling operation will continue inthis manner in every subsequent transmission until an error condition isencountered.

[1527] When the receiving ISI device detects an error in the transmittedlong packet or is unable to accept the packet (because of full buffersfor example) it will not return any packet and it will not toggle itslocal sequence bit. An example of this is depicted in FIG. 37. Theabsence of any response prompts the transmitting device to retransmitthe original (seq=0) packet. This time the packet is received withoutany errors (or buffer space may have been freed) so the receiving ISIdevice toggles its local sequence bit and responds with an ACK. Thetransmitting device then toggles its local sequence bit to a 1 uponcorrect receipt of the ACK.

[1528] However it is also possible for the ACK packet from the receivingISI device to be corrupted and this scenario is shown in FIG. 38. Inthis case the receiving device toggles its local sequence bit to 1 whenthe long packet is received without error and replies with an ACK to thetransmitting device. The transmitting device does not receive the ACKcorrectly and so does not change its local sequence bit. It thenretransmits the seq=0 long packet. When the receiving device finds thatthere is a mismatch between the transmitted sequence bit and theexpected (local) sequence bit is discards the long packet and replieswith an ACK. When the transmitting ISI device correctly receives the ACKit updates its local sequence bit to a 1, thus restoringsynchronization. Note that when the ISISubIdSeqMask bit for theaddressed ISISubId is set then the retransmitted packet is not discardedand so a duplicate packet will be received. The data contained in thepacket should be self-ordering and so the software handling thesepackets (most likely control messages) is expected to deal with thiseventuality.

[1529] 12.4.4.10 Flow Control

[1530] The ISI also supports flow control by treating it in exactly thesame manner as an error in the received packet. Because the SCB enjoysgreater guaranteed bandwidth to DRAM than both the ISI and USB cansupply flow control should not be required during normal operation. Anyblockage on a DMA channel will soon result in the NumRetries value beingexceeded and transmission from that SoPEC being halted. If a SoPECNAK_(S) a packet because its RxBuffer is full it will flag an overflowcondition. This condition can potentially cause a CPU interrupt, if thecorresponding interrupt is enabled. The RxOverflowSticky bit of itsISIIntStatus register reflects this condition. Because flow control istreated in the same manner as an error the transmitting ISI device willnot be able to differentiate a flow control condition from an error inthe transmitted packet.

[1531] 12.4.4.11 Auto-Ping Operation

[1532] While the CPU of the ISIMaster could send a ping packet bywriting the appropriate header to the CPUISITxBuffCntrl register it isexpected that all ping packets will be generated in the ISI itself. Theuse of automatically generated ping packets ensures that ISISlaves willbe given access to the ISI bus with a programmable minimum guaranteedfrequency in addition to whenever it would otherwise be idle. Fiveregisters facilitate the automatic generation of ping messages withinthe ISI: PingSchedule0, PingSchedule1, PingSchedule2, ISITotalPeriod andISILocalPeriod. Auto-pinging will be enabled if any bit of any of thePingScheduleN registers is set and disabled if all PingScheduleNregisters are 0x0000.

[1533] Each bit of the 15-bit PingScheduleN register corresponds to anISIId that is used in the Address field of the ping packet and a 1 inthe bit position indicates that a ping packet is to be generated forthat ISIId. A 0 in any bit position will ensure that no ping packet isgenerated for that ISIId. As ISISlaves may differ in their bandwidthrequirement (particularly if a storage SoPEC is present) three differentPingSchedule registers are used to allow an ISISlave receive up to threetimes the number of pings as another active ISISlave. When the ISIMasteris not sending long packets (sourced from either the CPU or USB in thecase of a SoPEC ISIMaster) ISI ping packets will be transmittedaccording to the pattern given by the three PingScheduleN registers. TheISI will start with the lsb of PingSchedule0 register and work its wayfrom lsb through msb of each of the PingScheduleN registers. When themsb of PingSchedule2 is reached the ISI returns to the lsb ofPingSchedule0 and continues to cycle through each bit position of eachPingScheduleN register. The ISI has more than enough time to work outthe destination of the next ping packet while a ping or long packet isbeing transmitted.

[1534] With the addition of auto-ping operation we now have threepotential sources of packets in an ISIMaster SoPEC: USB, CPU andauto-ping. Arbitration between the CPU and USB for access to the ISI ishandled outside the ISI. To ensure that local packets get prioritywhenever possible and that ping packets can have some guaranteed accessto the ISI we use two 4-bit counters whose reload value is contained inthe ISITotalPeriod and ISILocalPeriod registers. As we saw in section12.4.4.1 every ISI transaction is initiated by the ISIMastertransmitting either a long packet or a ping packet. The ISITotalPeriodcounter is decremented for every ISI transaction (i.e. either long orping) when its value is non-zero. The ISILocalPeriod counter isdecremented for every local packet that is transmitted. Neither counteris decremented by a retransmitted packet. If the ISITotalPeriod counteris zero then ping packets will not change its value from zero. Both theISITotalPeriod and ISILocalPeriod counters are reloaded by the nextlocal packet transmit request after the ISITotalPeriod counter hasreached zero and this local packet has priority over pings.

[1535] The amount of guaranteed ISI bandwidth allocated to both localand ping packets is determined by the values of the ISITotalPeriod andISILocalPeriod registers. Local packets will always be given prioritywhen the ISILocalPeriod counter is non-zero. Ping packets will be givenpriority when the ISILocalPeriod counter is zero and the ISITotalPeriodcounter is still non-zero.

[1536] Note that ping packets are very likely to get more than theirguaranteed bandwidth as they will be transmitted whenever the ISI buswould otherwise be idle (i.e. no pending local packets). In particularwhen the ISITotalPeriod counter is zero it will not be reloaded untilanother local packet is pending and so ping packets transmitted when theISITotalPeriod counter is zero will be in addition to the guaranteedbandwidth. Local packets on the other hand will never get more thantheir guaranteed bandwidth because each local packet transmitteddecrements both counters and will cause the counters to be reloaded whenthe ISITotalPeriod counter is zero. The difference between the values ofthe ISITotalPeriod and ISILocalPeriod registers determines the number ofautomatically generated ping packets that are guaranteed to betransmitted every ISITotalPeriod number of ISI transactions. If theISITotalPeriod and ISILocalPeriod values are the same then the localpackets will always get priority and could totally exclude ping packetsif the CPU always has packets to send.

[1537] For example if ISITotalPeriod=0xC; ISILocalPeriod=0x8;PingSchedule0=0x0E; PingSchedule1=0x0C and PingSchedule2=0x08 then fourping messages are guaranteed to be sent in every 12 ISI transactions.Furthermore ISIId3 will receive 3 times the number of ping packets asISId1 and ISId2 will receive twice as many as ISId1. Thus over a periodof 36 contended ISI transactions (allowing for two full rotationsthrough the three PingScheduleN registers) when local packets are alwayspending 24 local packets will be sent, ISId1 will receive 2 pingpackets, ISId2 will receive 4 pings and ISId3 will receive 6 pingpackets. If local traffic is less frequent then the ping frequency willautomatically adjust upwards to consume all remaining ISI bandwidth.

[1538] 12.4.5 Wake-Up From Sleep Mode

[1539] Either the PrintMaster SoPEC or the external host may place anyof the ISISlave SoPECs in sleep mode prior to going into sleep modeitself. The ISISlave device should then ensure that its ISIWakeupEnablebit of the WakeupEnable register (see Table 34) is set prior to enteringsleep mode. In an ISISlave device the ISI block will continue to receivepower and clock during sleep mode so that it may monitor thegpio_isi_din lines for activity. When ISI activity is detected duringsleep mode and the ISIWakeupEnable bit is set the ISI asserts theisi_cpr_reset_n signal. This will bring the rest of the chip out ofsleep mode by means of a wakeup reset. See chapter 16 for more detailsof reset propagation.

[1540] 12.4.6 Implementation

[1541] Although the ISI consists of either 2 or 4 ISI data lines overwhich a serial data stream is demultiplexed, each ISI line is treated asa separate serial link at the physical layer. This permits a certainamount of skew between the ISI lines that could not be tolerated if thelines were treated as a parallel bus. A lower Bit Error Rate (BER) canbe achieved if the serial data recovery is performed separately on eachserial link. FIG. 39 illustrates the ISI sub block partitioning.

[1542] 12.4.6.1 ISI Sub-Block Partition

[1543] Definition of I/Os. TABLE 34 ISI I/O Port name Pins I/ODescription Clock and Reset isi_pclk 1 In ISI primary clock. isi_reset_n1 In ISI reset. Active low. Asserting isi_reset_n will reset all ISIlogic. Synchronous to isi_pclk. Configuration isi_go 1 In ISI GO. Activehigh. When GO is de-asserted, all ISI statemachines are reset to theiridle states, all ISI output signals are de-asserted, but all ISIcounters retain their values. When GO is asserted, all ISI counters arereset and all ISI statemachines and output signals will return to theirnormal mode of operation. isi_master_select 1 In ISI master select.Determines whether the SoPEC is an ISIMaster or not 1 = ISIMaster 0 =ISISlave isi_id[3:0] 4 In ISI ID for this device. isi_retries[3:0] 4 InISI number of retries. Number of times a trans- mitting ISI device willattempt retransmission of a NAK'd packet before aborting thetransmission and flagging an error. The value of this configurationsignal should not be changed while there are valid packets in the Txbuffer. isi_ping_schedule0 15 In ISI auto ping schedule #0. [14:0]Denotes which ISIIds will be receive ping packets. Note that bit0 refersto ISIId0, bit1 to ISIId1 . . . bit14 to ISIId14. Setting a bit in thisschedule will enable auto ping generation for the corresponding ISI ID.The ISI will start from the bit 0 of isi_ping_schedule0 and cyclethrough to bit 14, generating pings for each bit that is set. Thisoperation will be performed in sequence from isi_ping_schedule0 throughisi_ping_schedule2. isi_ping_schedule1 15 In As per isi_ping_schedule0.[14:0] isi_ping_schedule2 15 In As per isi_ping_schedule0. [14:0]isi_total_period[3:0] 4 In Reload value of the ISI Total Period Counter.isi_local_period[3:0] 4 In Reload value of the ISI Local Period Counter.isi_number_pins 1 In Number of active ISI data pins. Used to select howmany serial data pins will be used to transmit and receive data. Shouldreflect the number of ISI device data pins that are in use. 1 =isi_data[3:0] active 0 = isi_data[1:0] active isi_turn_around[3:0] 4 InISI bus turn around time in ISI clock cycles (32 MHz).isi_short_reply_win[4:0] 5 In ISI long packet reply window in ISI clockcycles (32 MHz). isi_long_reply_win[8:0] 9 In ISI long packet replywindow in ISI clock cycles (32 MHz). isi_tx_enable 1 In ISI transmitenable. Active high. Enables ISI transmission of long or ping packets.ACKs may still be transmitted when this bit is 0. The value of thisconfiguration signal should not be changed while there are valid packetsin the Tx buffer. isi_rx_enable 1 In ISI receive enable. Active high.Enables ISI packet reception. Any activity on the ISI bus will beignored when this signal is de- asserted. This signal should only be de-asserted if the ISI block is not required for use in the design.isi_bit_stuff_rate[3:0] 1 In ISI bit stuffing limit. Allows the bitstuffing counter value to be programmed. Is loaded into the 4 upper bitsof the 7bit wide bit stuffing counter. The lower bits are always loadedwith b111, to prevent bit stuffing for less than 7 consecutive ones orzeroes. E.g. b000 : stuff_count = b0000111 : bit stuff after 7consecutive 0/1 b111 : stuff_count = b1111111 : bit stuff after 127consecutive 0/1 Serial Link Signals isi_ser_data_in[3:0] 4 In ISI Serialdata inputs. Each bit corresponds to a separate serial link.isi_ser_data_out[3:0] 4 Out ISI Serial data outputs. Each bitcorresponds to a separate serial link. isi_ser_data_en[3:0] 4 Out ISISerial data driver enables. Active high. Each bit corresponds to aseparate serial link. Tx Packet Buffer isi_tx_wr_en 1 In ISI Tx FIFOwrite enable. Active high. Asserting isi_tx_wr_en will write the 64 bitdata on isi_tx_wr_data to the FIFO, providing that space is available inthe FIFO. If isi_tx_wr_en remains asserted after the last entry in thecurrent packet is written, the write operation will wrap around to thestart of the next packet, providing that space is available for a secondpacket in the FIFO. isi_tx_wr_data[63:0] 64 In ISI Tx FIFO write data.isi_tx_ping 1 In ISI Tx FIFO ping packet select. Active high. Assertingisi_tx_ping will queue a ping packet for transmission, as opposed to along packet. Although there is no data payload for a ping packet, apacket location in the FIFO is used as a 'place holder' for the pingpacket. Any data written to the associated packet location in the FIFOwill be discarded when the ping packet is transmitted. isi_tx_id[3:0] 5In ISI Tx FIFO packet ID. ISI ID for each packet written to the FIFO.Registered when the last entry of the packet is written. isi_tx_sub_id 1In ISI Tx FIFO packet sub ID. ISI sub ID for each packet written to theFIFO. Registered when the last entry of the packet is written.isi_tx_pkt_count[1:0] 2 Out ISI Tx FIFO packet count. Indicates thenumber of packets contained in the FIFO. The FIFO has a capa- city of 2× 256 bit packets. Range is b00->b10. isi_tx_word_count[2:0] 3 Out ISITx FIFO current packet word count. Indicates the number of wordscontained in the current Tx packet location of the Tx FIFO. Each packetlocation has a capacity of 4 × 64 bit words. Range is b000->b100.isi_tx_empty 1 Out ISI Tx FIFO empty. Active high. Indicates that nopackets are present in the FIFO. isi_tx_full 1 Out ISI Tx FIFO full.Active high. Indicates that 2 packets are present in the FIFO, thereforeno more packets can be transmitted. isi_tx_over_flow 1 Out ISI Tx FIFOoverflow. Active high. Indicates that a write operation was performed ona full FIFO. The write operation will have no effect on the contents ofthe FIFO or the write pointer. isi_tx_error 1 Out ISI Tx FIFO error.Active high. Indicates that an error occurred while transmitting thepacket currently at the head of the FIFO. This will happen if the numberof trans- mission attempts exceeds isi_tx_retries. isi_tx_desc[2:0] 3Out ISI Tx packet descriptor field. ISI packet descriptor field for thepacket currently at the head of the FIFO. See Table for details. Onlyvalid when isi_tx_empty = 0, i.e. when there is a valid packet in theFIFO. isi_tx_addr[4:0] 5 Out ISI Tx packet address field. ISI addressfield for the packet currently at the head of the FIFO. See Table fordetails. Only valid when isi_tx_empty=0, i.e. when there is a validpacket in the FIFO. Rx Packet FIFO isi_rx_rd_en 1 In ISI Rx FIFO readenable. Active high. Asserting isi_rx_rd_en will drive isi_rx_rd_datawith valid data, from the Rx packet at the head of the FIFO, providingthat data is available in the FIFO. If isi_rx_rd_en remains assertedafter the last entry is read from the current packet, the read operationwill wrap around to the start of the next packet, providing that asecond packet is available in the FIFO. isi_rx_rd_data[63:0] 64 Out ISIRx FIFO read data. isi_rx_sub_id 1 Out ISI Rx packet sub ID. Indicatesthe ISI sub ID associated with the packet at the head of the Rx FIFO.isi_rx_pkt_count[1:0] 2 Out ISI Rx FIFO packet count. Indicates thenumber of packets contained in the FIFO. The FIFO has a capacity of 2 ×256 bit packets. Range is b00->b10. isi_rx_word_count[2:0] 3 Out ISI RxFIFO current packet word count. Indicates the number of words containedin the Rx packet location at the head of the FIFO. Each packet locationhas a capacity of 4 × 64 bit words. Range is b000->b100. isi_rx_empty 1Out ISI Rx FIFO empty. Active high. Indicates that no packets arepresent in the FIFO. isi_rx_full 1 Out ISI Rx FIFO full. Active high.Indicates that 2 packets are present in the FIFO, therefore no morepackets can be received. isi_rx_over_flow 1 Out ISI Rx FIFO over flow.Active high. Indicates that a packet was addressed to the local ISIdevice, but the Rx FIFO was full, resulting in a NAK. isi_rx_under_run 1Out ISI Rx FIFO under run. Active high. Indicates that a read operationwas per- formed on an empty FIFO. The invalid read will return thecontents of the memory location currently addressed by the FIFO readpointer and will have no effect on the read pointer. isi_rx_frame_error1 Out ISI Rx framing error. Active high. Asserted by the ISI when aframing error is de- tected in the received packet, which can be causedby an incorrect Start or Stop field or by bit stuffing errors. Theassociated packet will be dropped. isi_rx_crc_error 1 Out ISI Rx CRCerror. Active high. Asserted by the ISI when a CRC error is detected inan incoming packet. Other than dropping the errored packet ISI receptionis unaffected by a CRC Error.

[1544] 12.4.6.2 ISI Serial Interface Engine (isi_sie)

[1545] There are 4 instantiations of the isi_sie sub block in the ISI, 1per ISI serial link. The isi_sie is responsible for Rx serial datasampling, Tx serial data output and bit stuffing.

[1546] Data is sampled based on a phase detection mechanism. Theincoming ISI serial data stream is over sampled 5 times per ISI bitperiod. The phase of the incoming data is determined by detectingtransitions in the ISI serial data stream, which indicates the ISI bitboundaries. An ISI bit boundary is defined as the sample phase at whicha transition was detected.

[1547] The basic functional components of the isi_sie are detailed inFIG. 40. These components are simply a grouping of logical functionalityand do not necessarily represent hierarchy in the design.

[1548] 12.4.6.2.1 SIE Edge Detection and Data I/O

[1549] The basic structure of the data I/O and edge detection mechanismis detailed in FIG. 41.

[1550] NOTE: Serial data from the receiver in the pad MUST besynchronized to the isi_pclk domain with a 2 stage shift registerexternal to the ISI, to reduce the risk of metastability. ser_data_outand ser_data_en should be registered externally to the ISI.

[1551] The Rx/Tx statemachine drives ser_data_en, stuff_(—)1_en andstuff_(—)0_en. The signals stuff_(—)1_en and stuff_(—)0_en cause a oneor a zero to be driven on ser_data_out when they are asserted, otherwisefifo_rd_data is selected.

[1552] 12.4.6.2.2 SIE Rx/Tx Statemachine

[1553] The Rx/Tx statemachine is responsible for the transmission of ISITx data and the sampling of ISI Rx data. Each ISI bit period is 5isi_pclk cycles in duration.

[1554] The Tx cycle of the Rx/Tx statemachine is illustrated in FIG. 42.It generates each ISI bit that is transmitted. States tx0->tx4 representeach of the 5 isi_pclk phases that constitute a Tx ISI bit period.ser_data_en controls the tristate enable for the ISI line driver in thebidirectional pad, as shown in FIG. 41. rx_tx_cycle is asserted duringboth Rx and Tx states to indicate an active Rx or Tx cycle. It isprimarily used to enable bit stuffing.

[1555] NOTE: All statemachine signals are assumed to be ‘0’ unlessotherwise stated.

[1556] The Tx cycle for Tx bit stuffing when the Rx/Tx statemachineinserts a ‘0’ into the bitstream can be seen in FIG. 43.

[1557] NOTE: All statemachine signals are assumed to be ‘0’ unlessotherwise stated

[1558] The Tx cycle for Tx bit stuffing when the RxTx statemachineinserts a ‘1’ into the bitstream can be seen in FIG. 44.

[1559] NOTE: All statemachine signals are assumed to be ‘0’ unlessotherwise stated

[1560] The tx* and stuff* states are detailed separately for clarity.They could be easily combined when coding the statemachine, however itwould be better for verification and debugging if they were keptseparate.

[1561] The Rx cycle of the ISI Rx/Tx statemachine is detailed in FIG.45. The Rx cycle of the Rx/Tx Statemachine, samples each ISI bit that isreceived. States rx0->rx4 represent each of the 5 isi_pclk phases thatconstitute a Rx ISI bit period.

[1562] The optimum sample position for an ideal ISI bit period is 2isi_pclk cycles after the ISI bit boundary sample, which should resultin a data sample close to the centre of the ISI bit period. rx_sample isasserted during the rx2 state to indicate a valid ISI data sample onrx_bit, unless the bit should be stripped when flagged by the bitstuffing statemachine, in which case rx_sample is not asserted duringrx2 and the bit is not written to the FIFO. When edge is asserted, itresets the Rx cycle to the rx0 state, from any rx state. This is how theisi_sie tracks the phase of the incoming data. The Rx cycle will cyclethrough states rx0->rx4 until edge is asserted to reset the samplephase, or a tx_req is asserted indicating that the ISI needs totransmit.

[1563] Due to the 5 times oversampling a maximum phase error of 0.4 ofan ISI bit period (2 isi_pclk cycles out of 5) can be tolerated.

[1564] NOTE: All statemachine signals are assumed to be ‘0’ unlessotherwise stated.

[1565] An example of the Tx data generation mechanism is detailed inFIG. 46. tx_req and fifo_wr_tx are driven by the framer block.

[1566] An example of the Rx data sampling functional timing is detailedin FIG. 47. The dashed lines on the ser_data_in_ff signal indicate wherethe Rx/Tx statemachine perceived the bit boundary to be, based on thephase of the last ISI bit boundary. It can be seen that data is sampledduring the same phase as the previous bit was, in the absence of atransition.

[1567] 12.4.6.2.3 SIE Rx/Tx FIFO

[1568] The Rx/Tx FIFO is a 7×1 bit synchronous look-ahead FIFO that isshared for Tx and Rx operations. It is required to absorb any Rx/Txlatency caused by bit stripping/stuffing on a per ISI line basis, i.e.some ISI lines may require bit stripping/stuffing during an ISI bitperiod while the others may not, which would lead to a loss ofsynchronization between the data of the different ISI lines, if a FIFOwere not present in each isi_sie.

[1569] The basic functional components of the FIFO are detailed in FIG.48. tx_ready is driven by the Rx/Tx statemachine and selects whichsignals control the read and write operations. tx_ready=1 during ISItransmission and selects the fifo_*tx control and data signals.tx_ready=0 during ISI reception and selects the fifo_*rx control anddata signals. fifo_reset is driven by the Rx/Tx statemachine. It isactive high and resets the FIFO and associated logic before/aftertransmitting a packet to discard any residual data.

[1570] The size of the FIFO is based on the maximum bit stuffingfrequency and the size of the shift register used to segment/re-assemblethe multiple serial streams in the ISI framing logic. The maximum bitstuffing frequency is every 7 consecutive ones or zeroes. The shiftregister used is 32 bits wide. This implies that the maximum number ofstuffed bits encountered in the time it takes to fill/empty the shiftregister if 4. This would suggest that 4×1 bit would be the minimumideal size of the FIFO. However it is necessary to allow for differentskew and phase error between the ISI lines, hence a 7×1 bit FIFO.

[1571] The FIFO is controlled by the isi_sie during packet reception andis controlled by the isi_frame block during packet transmission. This isillustrated in FIG. 49. The signal tx_ready selects which mode the FIFOcontrol signals operate in. When tx_ready=0, i.e. Rx mode, the isi_siecontrol signals rx_sample, fifo_rd_rx and ser_data_in_ff are selected.When tx_ready=1, i.e. Tx mode, the sie_frame control signals fifo_wr_tx,fifo_rd_tx and fifo_wr_data_tx are selected.

[1572] 12.4.6.3 Bit Stuffing

[1573] Programmable bit stuffing is implemented in the isi_sie. This isto allow the system to determine the amount of bit stuffing necessaryfor a specific ISI system devices. It is unlikely that bit stuffingwould be required in a system using a 100 ppm rated crystal. However, aprogrammable bit stuffing implementation is much more versatile androbust.

[1574] The bit stuffing logic consists of a counter and a statemachinethat track the number of consecutive ones or zeroes that are transmittedor received and flags the Rx/Tx statemachine when the bit stuffing limithas been reached. The counter, stuff count, is a 7 bit counter, whichdecrements when rx_sample is asserted on a Rx cycle or when fifo_rd_txis asserted on a Tx cycle. The upper 4 bits of stuff_count are loadedwith isi_bit_stuff_rate. The lower 3 bits of stuff_count are alwaysloaded with b111, i.e. for isi_bit_stuff_rate=b000, the counter would beloaded with b0000111. This is to prevent bit stuffing for less than 7consecutive ones or zeroes. This allows the bit stuffing limit to be setin the range 7->127 consecutive ones or zeroes.

[1575] NOTE: It is extremely important that a change in the bit stuffingrate, isi_bit_stuff_rate, is carefully coordinated between ISI devicesin a system. It is obvious that ISI devices will not be able tocommunicate reliably with each other with different bit stuffingsettings. It is recommended that all ISI devices in a system default tothe safest bit stuffing rate (isi_bit_stuff_rate=b000) at reset. Thesystem can then co-ordinate the change to an optimum bit stuffing rate.

[1576] The ISI bit stuffing statemachine Tx cycle is shown in FIG. 50.The counter is loaded when stuff_count_load is asserted.

[1577] NOTE: All statemachine signals are assumed to be ‘0’ unlessotherwise stated.

[1578] The ISI bit stuffing statemachine Rx cycle is shown in FIG. 51.It should be noted that the statemachine enters the strip state whenstuff_count=0x2. This is because the statemachine can only transition torx0 or rx1 when rx_sample is asserted as it needs to be synchronized tochanges in sampling phase introduced by the Rx/Tx statemachine.Therefore a one or a zero has already been sampled by the time it entersrx0 or rx1. This is not the case for the Tx cycle, as it will alwayshave a stable 5 isi_pclk cycles per bit period and relies purely on thedata value when entering tx0 or tx1. The Tx cycle therefore entersstuff1 or stuff0 when stuff_count=0x1.

[1579] NOTE: All statemachine signals are assumed to be ‘0’ unlessotherwise stated.

[1580] 12.4.6.4 ISI Framing and CRC Sub-Block (isi_frame)

[1581] 12.4.6.4.1 CRC Generation/Checking

[1582] A Cyclic Redundancy Checksum (CRC) is calculated over all fieldsexcept the start and stop fields for each long or ping packettransmitted. The receiving ISI device will perform the same calculationon the received packet to verify the integrity of the packet. Theprocedure used in the CRC generation/checking is the same as the FrameChecking Sequence (FCS) procedure used in HDLC, detailed in ITU-TRecommendation T30[39].

[1583] For generation/checking of the CRC field, the shift registerillustrated in FIG. 52 is used to perform the modulo 2 division on thepacket contents by the polynomial G(x)=x¹⁶+x¹²+x⁵+1.

[1584] To generate the CRC for a transmitted packet, where T(x)=[PacketDescriptor field, Address field, Data Payload field] (a ping packet willnot contain a data payload field).

[1585] Set the shift register to 0xFFFF.

[1586] Shift T(x) through the shift register, LSB first. This can occurin parallel with the packet transmission.

[1587] Once the each bit of T(x) has been shifted through the register,it will contain the remainder of the modulo 2 division T(x)/G(x).

[1588] Perform a ones complement of the register contents, giving theCRC field which is transmitted MSB first, immediately following the lastbit of M(x

[1589] To check the CRC for a received packet, where R(x)=[PacketDescriptor field, Address field, Data Payload field, CRC field] (a pingpacket will not contain a data payload field).

[1590] Set the shift register to 0xFFFF.

[1591] Shift R(x) through the shift register, LSB first. This can occurin parallel with the packet reception.

[1592] Once each bit of the packet has been shifted through theregister, it will contain the remainder of the modulo 2 divisionR(x)/G(x).

[1593] The remainder should equal b0001110100001111, for a packetwithout errors.

[1594] 12.5 CTRL (Control Sub-Block)

[1595] 12.5.1 Overview

[1596] The CTRL is responsible for high level control of the SCBsub-blocks and coordinating access between them. All control and statusregisters for the SCB are contained within the CTRL and are accessed viathe CPU interface. The other major components of the CTRL are the SCBMap logic and the DMA Manager logic.

[1597] 12.5.2 SCB Mapping

[1598] In order to support maximum flexibility when moving data througha multi-SoPEC system it is possible to map any USB endpoint onto eitherDMAChannel within any SoPEC in the system. The SCB map, and indeed theSCB itself is based around the concept of an ISIId and an ISISubId. EachSoPEC in the system has a unique ISIId and two ISISubIds, namelyISISubId0 and ISISubId1. We use the convention that ISISubId0corresponds to DMAChannel0 in each SoPEC and ISISubId1 corresponds toDMAChannel1. The naming convention for the ISIId is shown in Table 35below and this would correspond to a multi-SoPEC system such as thatshown in FIG. 27. We use the term ISIId instead of SoPECId to avoidconfusion with the unique ChipID used to create the SoPEC_id andSoPEC_id_key (see chapter 17 and [9] for more details). TABLE 35 ISIIdnaming convention ISIId SoPEC to which it refers 0-14 Standard deviceISIIds (0 is the power-on reset value) 15 Broadcast ISIId

[1599] The combined ISIId and ISISubId therefore allows the ISI toaddress DMAChannel0 or DMAChannel1 on any SoPEC device in the system.The ISI, DMA manager and SCB map hardware use the ISIId and ISISubId tohandle the different data streams that are active in a multi-SoPECsystem as does the software running on the CPU of each SoPEC. In thisdocument we will identify DMAChannels as ISIx.y where x is the ISIId andy is the ISISubId. Thus ISI2.1 refers to DMAChannel1 of ISISlave2. Anydata sent to a broadcast channel, i.e. ISI15.0 or ISI15.1, are receivedby every ISI device in the system including the ISIMaster (which may bean ISI-Bridge). The USB device controller and software stacks howeverhave no understanding of the ISIId and ISISubId but the Silverbrookprinter driver software running on the external host does make use ofthe ISIId and ISISubId. USB is simply used as a data transport—themapping of USB device endpoints onto ISIId and SubId is communicatedfrom the external host Silverbrook code to the SoPEC Silverbrook codethrough USB control (or possibly bulk data) messages i.e. the mappinginformation is simply data payload as far as USB is concerned. The coderunning on SoPEC is responsible for parsing these messages andconfiguring the SCB accordingly.

[1600] The use of just two DMAChannels places some limitations on whatcan be achieved without software intervention. For every SoPEC in thesystem there are more potential sources of data than there are sinks.For example an ISISlave could receive both control and data messagesfrom the ISIMaster SoPEC in addition to control and data from theexternal host, either specifically addressed to that particular ISISlaveor over the broadcast ISI channel. However all ISISlaves only have twopossible data sinks, i.e. DMAChannelz0 and DMAChannel1. Another exampleis the ISIMaster in a multi-SoPEC system which may receive controlmessages from each SoPEC in addition to control and data informationfrom the external host (e.g. over USB). In this case all of the controlmessages are in contention for access to DMAChannel0. We resolve thesepotential conflicts by adopting the following conventions:

[1601] 1) Control messages may be interleaved in a memory buffer: Thememory buffer that the DMAChannel0 points to should be regarded as acentral pool of control messages. Every control message must containfields that identify the size of the message, the source and thedestination of the control message. Control messages may therefore bemultiplexed over a DMAChannel which allows several control messagesources to address the same DMAChannel. Furthermore, if SoPEC-typecontrol messages contain source and destination fields it is possiblefor the external host to send control messages to individual SoPECs overthe ISI15.0 broadcast channel.

[1602] 2) Data messages should not be interleaved in a memory buffer: Asdata messages are typically part of a much larger block of data that isbeing transferred it is not possible to control their contents in thesame manner as is possible with the control messages. Furthermore we donot want the CPU to have to perform reassembly of data blocks. Datamessages from different sources cannot be interleaved over the sameDMAChannel—the SCB map must be reconfigured each time a different datasource is given access to the DMAChannel.

[1603] 3) Every reconfiguration of the SCB map requires the exchange ofcontrol messages: SoPEC's SCB map reset state is shown in Table and anysubsequent modifications to this map require the exchange of controlmessages between the SoPEC and the external host. As the external hostis expected to control the movement of data in any SoPEC system it isanticipated that all changes to the SCB map will be performed inresponse to a request from the external host. While the SoPEC couldautonomously reconfigure the SCB map (this is entirely up to thesoftware running on the SoPEC) it should not do so without informing theexternal host in order to avoid data being misrouted.

[1604] An example of the above conventions in operation is workedthrough in section 12.5.2.3.

[1605] 12.5.2.1 SCB Map Rules

[1606] The operation of the SCB map is described by these 2 rules:

[1607] Rule 1: A packet is routed to the DMA manager if it originatesfrom the USB device core and has an ISIId that matches the local SoPECISIId.

[1608] Rule 2: A packet is routed to the ISI if it originates from theCPU or has an ISIId that does not match the local SoPEC ISIId.

[1609] If the CPU erroneously addresses a packet to the ISIId containedin the ISIId register (i.e. the ISIId of the local SoPEC) then thatpacket will be transmitted on the ISI rather than be sent to the DMAmanager. While this will usually cause an error on the ISI there is onesituation where it could be beneficial, namely for initial dialog in a 2SoPEC system as both devices come out of reset with an ISIId of 0.

[1610] 12.5.2.2 External Host to ISIMaster SoPEC Communication

[1611] Although the SCB map configuration is independent of ISIMasterstatus, the following discussion on SCB map configurations assumes theISIMaster is a SoPEC device rather than an ISI bridge chip, and thatonly a single USB connection to the external host is present. Theinformation should apply broadly to an ISI-Bridge but we focus here onan ISIMaster SoPEC for clarity.

[1612] As the ISIMaster SoPEC represents the printer device on the PCUSB bus it is required by the USB specification to have a dedicatedcontrol endpoint, EP0. At boot time the ISIMaster SoPEC will alsorequire a bulk data endpoint to facilitate the transfer of program codefrom the external host. The simplest SCB map configuration, i.e. for asingle stand-alone SoPEC, is sufficient for external host to ISIMasterSoPEC communication and is shown in Table 36. TABLE 36 Single SoPEC SCBmap configuration Source Sink EP0 ISI0.0 EP1 ISI0.1 EP2 nc EP3 nc EP4 nc

[1613] In this configuration all USB control information exchangedbetween the external host and SoPEC over EP0 (which is the onlybidirectional USB endpoint). SoPEC specific control information (printerstatus, DNC info etc.) is also exchanged over EP0.

[1614] All packets sent to the external host from SoPEC over EP0 must bewritten into the DMA mapped EP buffer by the CPU (LEON-PC dataflow inFIG. 29). All packets sent from the external host to SoPEC are placed inDRAM by the DMA Manager, where they can be read by the CPU (PC-DIUdataflow in FIG. 29). This asymmetry is because in a multi-SoPECenvironment the CPU will need to examine all incoming control messages(i.e. messages that have arrived over DMAChannel0) to ascertain theirsource and destination (i.e. they could be from an ISISlave and destinedfor the external host) and so the additional overhead in having the CPUmove the short control messages to the EP0 FIFO is relatively small.Furthermore we wish to avoid making the SCB more complicated thannecessary, particularly when there is no significant performance gain tobe had as the control traffic will be relatively low bandwidth.

[1615] The above mechanisms are appropriate for the types ofcommunication outlined in sections 12.1.2.1.1 through 12.1.2.1.4

[1616] 12.5.2.3 Broadcast Communication

[1617] The SCB configuration for broadcast communication is also thedefault, post power-on reset, configuration for SoPEC and is shown inTable 37. TABLE 37 Default SoPEC SCB map configuration Source Sink EP0ISI0.0 EP1 ISI0.1 EP2 ISI15.0 EP3 ISI15.1 EP4 ISI1.1

[1618] USB endpoints EP2 and EP3 are mapped onto ISISubID0 and ISISubId1of ISIId15 (the broadcast ISIId channel). EP0 is used for controlmessages as before and EP1 is a bulk data endpoint for the ISIMasterSoPEC. Depending on what is convenient for the boot loader software, EP1may or may not be used during the initial program download, but EP1 ishighly likely to be used for compressed page or other program downloadslater. For this reason it is part of the default configuration. In thissetup the USB device configuration will take place, as it always must,by exchanging messages over the control channel (EP0).

[1619] One possible boot mechanism is where the external host sends thebootloader1 program code to all SoPECs by broadcasting it over EP3. EachSoPEC in the system then authenticates and executes the bootloader1program. The ISIMaster SoPEC then polls each ISISlave (over the ISIx.0channel). Each ISISlave ascertains its ISIId by sampling the particularGPIO pins required by the bootloader1 and reporting its presence andstatus back to the ISIMaster. The ISIMaster then passes this informationback to the external host over EP0. Thus both the external host and theISIMaster have knowledge of the number of SoPECs, and their ISIIds, inthe system. The external host may then reconfigure the SCB map to betteroptimise the SCB resources for the particular multi-SoPEC system. Thiscould involve simplifying the default configuration to a single SoPECsystem or remapping the broadcast channels onto DMAChannels inindividual ISISlaves.

[1620] The following steps are required to reconfigure the SCB map fromthe configuration depicted in Table to one where EP3 is mapped ontoISI1.0:

[1621] 1) The external host sends a control message(s) to the ISIMasterSoPEC requesting that USB EP3 be remapped to ISI1.0

[1622] 2) The ISIMaster SoPEC sends a control message to the externalhost informing it that EP3 has now been mapped to ISI1.0 (and thereforethe external host knows that the previous mapping of ISI15.1 is nolonger available through EP3).

[1623] 3) The external host may now send control messages directly toISISlave1 without requiring any CPU intervention on the ISIMaster SoPEC

[1624] 12.5.2.4 External Host to ISISlave SoPEC Communication

[1625] If the ISIMaster is configured correctly (e.g. when the ISIMasteris a SoPEC, and that SoPEC's SCB map is configured correctly) then datasent from the external host destined for an ISISlave will be transmittedon the ISI with the correct address. The ISI automatically forwards anydata addressed to it (including broadcast data) to the DMA channel withthe appropriate ISISubId. If the ISISlave has data to send to theexternal host it must do so by sending a control message to theISIMaster identifying the external host as the intended recipient. It isthen the ISIMaster's responsibility to forward this message to theexternal host.

[1626] With this configuration the external host can communicate withthe ISISlave via broadcast messages only and this is the mechanism bywhich the bootloader1 program is downloaded. The ISISlave is unable tocommunicate with the external host (or the ISIMaster) until thebootloader1 program has successfully executed and the ISISlave hasdetermined what its ISIId is. After the bootloader1 program (andpossibly other programs) has executed the SCB map of the ISIMaster maybe reconfigured to reflect the most appropriate topology for theparticular multi-SoPEC system it is part of.

[1627] All communication from an ISISlave to external host is eitherachieved directly (if there is a direct USB connection present forexample) or by sending messages via the ISIMaster. The ISISlave cannever initiate communication to the external host. If an ISISlave wishesto send a message to the external host via the ISIMaster it must waituntil it is pinged by the ISIMaster and then send a the message in along packet addressed to the ISIMaster. When the ISIMaster receives themessage from the ISISlave it first examines it to determine the intendeddestination and will then copy it into the EP0 FIFO for transmission tothe external host. The software running on the ISIMaster is responsiblefor any arbitration between messages from different sources (includingitself) that are all destined for the external host.

[1628] The above mechanisms are appropriate for the types ofcommunication outlined in sections 12.1.2.1.5 and 12.1.2.1.6.

[1629] 12.5.2.5 ISIMaster to ISISlave Communication

[1630] All ISIMaster to ISISlave communication takes place over the ISI.Immediately after reset this can only be by means of broadcast messages.Once the bootloader1 program has successfully executed on all SoPECs ina multi-SoPEC system the ISIMaster can communicate with each SoPEC on anindividual basis.

[1631] If an ISISlave wishes to send a message to the ISIMaster it maydo so in response to a ping packet from the ISIMaster. When theISIMaster receives the message from the ISISlave it must interpret themessage to determine if the message contains information required to besent to the external host. In the case of the ISIMaster being a SoPEC,software will transfer the appropriate information into the EP0 FIFO fortransmission to the external host.

[1632] The above mechanisms are appropriate for the types ofcommunication outlined in sections 12.1.2.3.3 and 12.1.2.3.4.

[1633] 12.5.2.6 ISISlave to ISISlave Communication

[1634] ISISlave to ISISlave communication is expected to be limited totwo special cases: (a) when the PrintMaster is not the ISIMaster and (b)when a storage SoPEC is used. When the PrintMaster is not the ISIMasterthen it will need to send control messages (and receive responses tothese messages) to other ISISlaves. When a storage SoPEC is present itmay need to send data to each SoPEC in the system. All ISISlave toISISlave communication will take place in response to ping messages fromthe ISIMaster.

[1635] 12.5.2.7 Use of the SCB Map in an ISISlave with a External HostConnection

[1636] After reset any SoPEC (regardless of ISIMaster/Slave status) withan active USB connection will route packets from EP0,1 to DMA channels0,1 because the default SCB map is to map EP0 to ISIId0.0 and EP1 toISIId0.1 and the default ISIId is 0. At some later time the SoPEC learnsits true ISIId for the system it is in and re-configures its ISIId andSCB map registers accordingly. Thus if the true ISIId is 3 the externalhost could reconfigure the SCB map so that EP0 and EP1 (or any otherendpoints for that matter) map to ISIId3.0 and 3.1 respectively. Theco-ordination of the updating of the ISIId registers and the SCB map isa matter for software to take care of. While the AutoMasterEnable bit ofthe ISICntrl register is set the external host must not send packetsdown EP2-4 of the USB connection to the device intended to be anISISlave. When AutoMasterEnable has been cleared the external host maysend data down any endpoint of the USB connection to the ISISlave.

[1637] The SCB map of an ISISlave can be configured to route packetsfrom any EP to any ISIId.ISISubId oust as an ISIMaster can). As with anISIMaster these packets will end up in the SCBTxBuffer but while anISIMaster would just transmit them when it got a local access slot (fromping arbitration) the ISISlave can only transmit them in response to aping. All this would happen without CPU intervention on the ISISlave (orISIMaster) and as long as the ping frequency is sufficiently high itwould enable maximum use of the bandwidth on both USB buses.

[1638] 12.5.3 DMA Manager

[1639] The DMA manager manages the flow of data between the SCB and theembedded DRAM. Whilst the CPU could be used for the movement of data inSoPEC, a DMA manager is a more efficient solution as it will handle datain a more predictable fashion with less latency and requiring lessbuffering. Furthermore a DMA manager is required to support the ISItransfer speed and to ensure that the SoPEC could be used with a highspeed ISI-Bridge chip in the future.

[1640] The DMA manager utilizes 2 write channels (DMAChannel0,DMAChannel1) and 1 read/write channel (DMAChannel2) to provide 2independent modes of access to DRAM via the DIU interface:

[1641] USBD/ISI type access.

[1642] USBH type access.

[1643] DIU read and write access is in bursts of 4×64 bit words. Bytealigned write enables are provided for write access. Data for DIU writeaccesses will be read directly from the buffers contained in therespective SCB sub-blocks. There is no internal SCB DMA buffer. The DMAmanager handles all issues relating to byte/word/longword addressalignment, data endianness and transaction scheduling. If a DMA channelis disabled during a DMA access, the access will be completed.Arbitration will be performed between the following DIU access requests:

[1644] USBD write request.

[1645] ISI write request.

[1646] USBH write request.

[1647] USBH read request.

[1648] DMAChannel0 will have absolute priority over any DMA requesters.In the absence of DMAChannel0 DMA requests, arbitration will beperformed in a round robin manner, on a per cycle basis over the otherchannels.

[1649] 12.5.3.1 DMA Effective Bandwidth

[1650] The DIU bandwidth available to the DMA manager must be set toensure adequate bandwidth for all data sources, to avoid back pressureon the USB and the ISI. This is achieved by setting the output (i.e.DIU) bandwidth to be greater than the combined input bandwidths (i.e.USBD+USBH+ISI). The required bandwidth is expected to be 160 Mbits/s (1bit/cycle @ 160 MHz). The guaranteed DIU bandwidth for the SCB isprogrammable and may need further analysis once there is betterknowledge of the data throughput from the USB IP cores.

[1651] 12.5.3.2 USBDIISI DMA Access

[1652] The DMA manager uses the two independent unidirectional writechannels for this type of DMA access, one for each ISISubID, to controlthe movement of data. Both DMAChannel0 and DMAChannel1 only supportwrite operation and can transfer data from any USB device DMA mapped EPbuffer and from the ISI receive buffer to separate circular buffers inDRAM, corresponding to each DMA channel.

[1653] While the DMA manager performs the work of moving data the CPUcontrols the destination and relative timing of data flows to and fromthe DRAM. The management of the DRAM data buffers requires the CPU tohave accurate and timely visibility of both the DMA and PEP memoryusage. In other words when the PEP has completed processing of a pageband the CPU needs to be aware of the fact that an area of memory hasbeen freed up to receive incoming data. The management of these buffersmay also be performed by the external host.

[1654] 12.5.3.2.1 Circular Buffer Operation

[1655] The DMA manager supports the use of circular buffers for bothDMAChannels. Each circular buffer is controlled by 5 registers:DMAnBottomAdr, DMAnTopAdr, DMAnMaxAdr, DMAnCurrWPtr and DMAnIntAdr. Theoperation of the circular buffers is shown in FIG. 53 below.

[1656] Here we see two snapshots of the status of a circular buffer with(b) occurring sometime after (a) and some CPU writes to the registersoccurring in between (a) and (b). These CPU writes are most likely to beas a result of a finished band interrupt (which frees up buffer space)but could also have occurred in a DMA interrupt service routineresulting from DMAnIntAdr being hit. The DMA manager will continuefilling the free buffer space depicted in (a), advancing theDMAnCurrWPtr after each write to the DIU. Note that the DMACurrWPtrregister always points to the next address the DMA manager will writeto. When the DMA manager reaches the address in DMAnIntAdr (i.e.DMACurrWPtr=DMAnIntAdr) it will generate an interrupt if theDMAnIntAdrMask bit in the DMAMask register is set. The purpose of theDMAnintAdr register is to alert the CPU that data (such as a controlmessage or a page or band header) has arrived that it needs to process.The interrupt routine servicing the DMA interrupt will change theDMAnintAdr value to the next location that data of interest to the CPUwill have arrived by.

[1657] In the scenario shown in FIG. 53 the CPU has determined (mostlikely as a result of a finished band interrupt) that the filled bufferspace in (a) has been freed up and is therefore available to receivemore data. The CPU therefore moves the DMAnMaxAdr to the end of thesection that has been freed up and moves the DMAnIntAdr address to anappropriate offset from the DMAnMaxAdr address. The DMA managercontinues to fill the free buffer space and when it reaches the addressin DMAnTopAdr it wraps around to the address in DMAnBottomAdr andcontinues from there. DMA transfers will continue indefinitely in thisfashion until the DMA manager reaches the address in the DMAnMaxAdrregister.

[1658] The circular buffer is initialized by writing the top and bottomaddresses to the DMAnTopAdr and DMAnBottomAdr registers, writing thestart address (which does not have to be the same as the DMAnBottomAdreven though it usually will be) to the DMAnCurrWPtr register andappropriate addresses to the DMAnIntAdr and DMAnMaxAdr registers. TheDMA operation will not commence until a 1 has been written to therelevant bit of the DMAChanEn register.

[1659] While it is possible to modify the DMAnTopAdr and DMAnBottomAdrregisters after the DMA has started it should be done with caution. TheDMAnCurrWPtr register should not be written to while the DMAChannel isin operation. DMA operation may be stalled at any time by clearing theappropriate bit of the DMAChanEn register or by disabling an SCB mappingor ISI receive operation.

[1660] 12.5.3.2.2 Non-Standard Buffer Operation

[1661] The DMA manager was designed primarily for use with a circularbuffer. However because the DMA pointers are tested for equality (i.e.interrupts generated when DMAnCurrWPtr=DMAIntAdr orDMAnCurrWPtr=DMAMaxAdr) and no bounds checking is performed on theirvalues (i e. neither DMAnIntAdr nor DMAnMaxAdr are checked to see ifthey lie between DMAnBottomAdr and DMAnTopAdr) a number of non-standardbuffer arrangements are possible. These include:

[1662] Dustbin buffer: If DMAnBottomAdr, DMAnTopAdr and DMAnCurrWPtr allpoint to the same location and both DMAnIntAdr and DMAnMaxAdr point toanywhere else then all data for that DMA channel will be dumped into thesame location without ever generating an interrupt. This is theequivalent to writing to /dev/null on Unix systems.

[1663] Linear buffer: If DMAnMaxAdr and DMAnTopAdr have the same valuethen the DMA manager will simply fill from DMAnBottomAdr to DMAnTopAdrand then stop. DMAnIntAdr should be outside this buffer or have itsinterrupt disabled.

[1664] 12.5.3.3 USBH DMA Access

[1665] The USBH requires DMA access to DRAM in to provide acommunication channel between the USB HC and the USB HCD via a sharedmemory resource. The DMA manager uses two independent channels for thistype of DMA access, one for reads and one for writes. The DRAM addressesprovided to the DIU interface are generated based on addresses definedin the USB HC core operational registers, in USBH section 12.3.

[1666] 12.5.3.4 Cache Coherency

[1667] As the CPU will be processing some of the data transferred(particularly control messages and page/band headers) into DRAM by theDMA manager, care needs to be taken to ensure that the data it uses isthe most recently transferred data. Because the DMA manager will beupdating the circular buffers in DRAM without the knowledge of the cachecontroller logic in the LEON CPU core the contents of the cache canbecome outdated. This situation can be easily handled by software, forexample by flushing the relevant cache lines, and so there is nohardware support to enforce cache coherency.

[1668] 12.5.4 ISI Transmit Buffer Arbitration

[1669] The SCB control logic will arbitrate access to the ISI transmitbuffer (ISITxBuffer) interface on the ISI block. There are two sourcesof ISI Tx packets:

[1670] CPUISITxBuffer, contained in the SCB control block.

[1671] ISI mapped USB EP OUT buffers, contained in the USB device block.

[1672] This arbitration is controlled by the ISITxBuffArb register whichcontains a high priority bit for both the CPU and the USB. If only oneof these bits is set then the corresponding source always has priority.Note that if the CPU is given absolute priority over the USB, then thesoftware filling the ISI transmit buffer needs to ensure that sufficientUSB traffic is allowed through. If both bits of the ISITxBufferArb havethe same value then arbitration will take place on a round robin basis.The control logic will use the USBEPnDest registers, as it will use theCPUISITxBuffCntrl register, to determine the destination of the packetsin these buffers. When the ISITxBuffer has space for a packet, the SCBcontrol logic will immediately seek to refill it. Data will betransferred directly from the CPUISITxBuffer and the ISI mapped USB EPOUT buffers to the ISITxBuffer without any intermediate buffering.

[1673] As the speed at which the ISITxBuffer can be emptied is at least5 times greater than it can be filled by USB traffic, the ISI mapped USBEP OUT buffers should not overflow using the above scheme in normaloperation. There are a number of scenarIOs which could lead to the USBEPs being temporarily blocked such as the CPU having priority,retransmissions on the ISI bus, channels being enabled (ChannelEn bit ofthe USBEPnDest register) with data already in their associated endpointbuffers or short packets being sent on the USB. Care should be taken toensure that the USB bandwidth is efficiently utilised at all times.

[1674] 12.5.5 Implementation

[1675] 12.5.5.1 CTRL Sub-Block Partition

[1676] Block Diagram

[1677] Definition of I/Os

[1678] 12.5.5.2 SCB Configuration Registers

[1679] The SCB register map is listed in Table 38. Registers are groupedaccording to which SCB sub-block their functionality is associated. Allconfiguration registers reside in the CTRL sub-block. The Reset valuesin the table indicates the 32 bit hex value that will be returned whenthe CPU reads the associated address location after reset. All Registerspre-fixed with Hc refer to Host Controller Operational Registers, asdefined in the OHCI Spec[19].

[1680] The SCB will only allow supervisor mode accesses to data space(i.e. cpu_acode[1:0]=b11). All other accesses will result inscb_cpu_berr being asserted.

[1681] TDB: Is read access necessary for ISI Rx/Tx buffers? Couldimplement the ISI interface as simple FIFOs as opposed to a memoryinterface. TABLE 38 SCB control block configuration registers Addre ssOffset from SCB_ base Register #Bits Reset Description CTRL 0x000SCBResetN 4 0x0000000F SCB software reset. Allows individual sub-blocksto be reset separately or together. Once a reset for a block has beeninitiated, by writing a 0 to the relevant register field, it can not besuppressed. Each field will be set after reset. Writing 0x0 to theSBCReset register will have the same effect as CPR generated hardwarereset. 0x004 SCBGo 2 0x00000000 SCB Go. Allows the ISI and CTRLsub-blocks to be selected separately or together. When go is de-assertedfor a particular sub-block, its statemachines are reset to their idlestates and its interface signals are de-asserted. The sub-block countersand configuration registers retain their values. When go is asserted fora particular sub-block, its counters are reset. The sub-blockconfiguration registers retain their values, i.e. they don't get reset.The sub-block statemachines and interface signals will return to theirnormal mode of operation. The CTRL field should be de-asserted beforedisabling the clock from any part of the SCB to avoid erroneous SCB DMArequests when the clock is enabled again. NOTE: This functionality hasnot been provided for the USBH and USBD sub- blocks because of the USBIP cores that they contain. We do not have direct control over the IPcore statemachines and counters, and it would cause unpredictablebehaviour if the cores were disabled in this way during operation. 0x008SCBWakeupEn 2 0x00000000 USB/ISI WakeUpEnable register 0x00CSCBISITxBufferArb 2 0x00000000 ISI transmit buffer access priorityregister. 0x010 SCBDebugSel[11:2] 10 0x00000000 SCB Debug selectregister. 0x014 USBEP0Dest 7 0x00000020 This register determines whichof the data sinks the data arriving in EP0 should be routed to. 0x018USBEP1Dest 7 0x00000021 Data sink mapping for USB EP1 0x01C USBEP2Dest 70x0000003E Data sink mapping for USB EP2 0x020 USBEP3Dest 7 0x0000003FData sink mapping for USB EP3 0x024 USBEP4Dest 7 0x00000023 Data sinkmapping for USB EP4 0x028 DMA0BottomAdr 17 DMAChannel0 bottom addressregister. [21:5] 0x02C DMA0TopAdr[21:5] 17 DMAChannel0 top addressregister. 0x030 DMA0CurrWPtr[21:5] 17 DMAChannel0 current write pointer.0x034 DMA0IntAdr[21:5] 17 DMAChannel0 interrupt address register. 0x038DMA0MaxAdr 17 DMAChannel0 max address register. [21:5] 0x03CDMA1BottomAdr 17 As per DMA0BottomAdr. [21:5] 0x040 DMA1TopAdr[21:5] 17As per DMA0TopAdr. 0x044 DMA1CurrWPtr[21:5] 17 As per DMA0CurrWPtr.0x048 DMA1IntAdr[21:5] 17 As per DMA0IntAdr. 0x04C DMA1MaxAdr[21:5] 17As per DMA0MaxAdr. 0x050 DMAAccessEn 3 0x00000003 DMA access enable.0x054 DMAStatus 4 0x00000000 DMA status register. 0x058 DMAMask 40x00000000 DMA mask register. 0x05C - 0x098 CPUISITxBuff[7:0] 32x8  n/aCPU ISI transmit buffer. 32-byte packet buffer, containing the payloadof a CPU sourced packet destined for transmission over the ISI. The CPUhas full write access to the CPUISITxBuff. NOTE: The CPU does not haveread access to CPUISITxBuif. This is because the CPU is the source ofthe data and to avoid arbitrating read access between the CPU and theCTRL sub-block. Any CPU reads from this address space will return0x00000000 0x09C CPUISITxBuffCtrl 9 0x00000000 CPU ISI transmit buffercontrol register. USBD 0x100 USBDIntStatus 19 0x00000000 USBD Interruptevent status register. 0x104 USBDISIFIFOStatus 16 0x00000000 USBD ISImapped OUT EP packet FIFO status register. 0x108 USBDDMA0FIFO 80x00000000 USBD DMAChannel0 mapped OUT EP Status packet FIFO statusregister. 0x10C USBDDMA1FIFO 8 0x00000000 USBD DMAChannel1 mapped OUT EPStatus packet FIFO status register. 0x110 USBDResume 1 0x00000000 USBDcore resume register. 0x114 USBDSetup 4 0x00000000 USBDsetup/configuration register. 0x118 - 0x154 USBDEp0InBuff[15:0] 32x16n/a USBD EP0-IN buffer. 64-byte packet buffer in the, containing thepayload of a USB packet destined for EP0-IN. The CPU has full writeaccess to the USBDEp0InBuff NOTE: The CPU does not have read access toUSBDEp0InBuff. This is because the CPU is the source of the data and toavoid arbitrating read access between the CPU and the USB device core.Any CPU reads from this address space will return 0x00000000. 0x158USBDEp0InBuffCtrl 1 0x00000000 USBD EP0-IN buffer control register.0x15C - 0x198 USBDEp5InBuff[15:0] 32x16 n/a USBD EP5-IN buffer. As perUSBDEp0InBuff. 0x19C USBDEp5InBuffCtrl 1 0x00000000 USBD EP5-IN buffercontrol register. 0x1A0 USBDMask 19 0x00000000 USBD interrupt maskregister. 0x1A4 USBDDebug 30 0x00000000 USBD debug register. USBH 0x200HcRevision Refer to [19] for #Bits, Reset, Description. 0x204 HcControlRefer to [19] for #Bits, Reset, Description. 0x208 HcCommandStatus Referto [19] for #Bits, Reset, Description. 0x20C HcInterruptStatus Refer to[19] for #Bits, Reset, Description. 0x210 HcInterruptEnable Refer to[19] for #Bits, Reset, Description. 0x214 HcInterruptDisable Refer to[19] for #Bits, Reset, Description. 0x218 HcHCCA Refer to [19] for#Bits, Reset, Description. 0x21C HcPeriodCurrentED Refer to [19] for#Bits, Reset, Description. 0x220 HcControlHeadED Refer to [19] for#Bits, Reset, Description. 0x224 HcControlCurrentED Refer to [19] for#Bits, Reset, Description. 0x228 HcBulkHeadED Refer to [19] for #Bits,Reset, Description. 0x22C HcBulkCurrentED Refer to [19] for #Bits,Reset, Description. 0x230 HcDoneHead Refer to [19] for #Bits, Reset,Description. 0x234 HcFmInterval Refer to [19] for #Bits, Reset,Description. 0x238 HcFmRemaining Refer to [19] for #Bits, Reset,Description. 0x23C HcFmNumber Refer to [19] for #Bits, Reset,Description. 0x240 HcPeriodicStart Refer to [19] for #Bits, Reset,Description. 0x244 HcLSTheshold Refer to [19] for #Bits, Reset,Description. 0x248 HcRhDescriptorA Refer to [19] for #Bits, Reset,Description. 0x24C HcRhDescrtptorB Refer to [19] for #Bits, Reset,Description. 0x250 HcRhStatus Refer to [19] for #Bits, Reset,Description. 0x254 HcRhPortStatus[1] Refer to [19] for #Bits, Reset,Description. 0x258 USBHStatus 3 0x00000000 USBH status register. 0x25CUSBHMask 2 0x00000000 USBH interrupt mask register. 0x260 USBHDebug 20x00000000 USBH debug register. ISI 0x300 ISICntrl 4 0x0000000B ISIControl register 0x304 ISIId 4 0x00000000 ISIId for this SoPEC. 0x308ISINumRetries 4 0x00000002 Number of ISI retransmissions register. 0x30CISIPingSchedule0 15 0x00000000 ISI Ping schedule 0 register. 0x310ISIPingSchedule1 15 0x00000000 ISI Ping schedule 1 register. 0x314ISIPingSchedule2 15 0x00000000 ISI Ping schedule 2 register. 0x318ISITotalPeriod 4 0x0000000F Reload value of the ISITotalPeriod counter.0x31C ISILocalPeriod 4 0x0000000F Reload value of the ISILocalPeriodcounter. 0x320 ISIIntStatus 4 0x00000000 ISI interrupt status register.0x324 ISITxBuffStatus 27 0x00000000 ISI Tx buffer status register. 0x328ISIRxBuffStatus 27 0x00000000 ISI Rx buffer status register. 0x32CISIMask 4 0x00000000 ISI Interrupt mask register. 0x330 - 0x34CISITxBuffEntry0[7:0] 32x8  n/a ISI transmit Buff, packet entry #0.32-byte packet entry in the ISITxBuff, containing the payload of an ISITx packet. CPU read access to ISITxBuffEntry0 is provided forobservability only i.e. CPU reads of the ISITxBuffEntry0 do not alterthe state of the buffer. The CPU does not have write access to theISITxBuffEntry0. 0x350 - 0x36C ISITxBuffEntry1[7:0] 32x8  n/a ISItransmit Buff, packet entry #1. As per ISITxBuffEntry0. 0x370 - 0x38CISIRxBuffEntry0[7:0] 32x8  n/a ISI receive Buff, packet entry #0.32-byte packet entry in the ISIRxBuff, containing the payload of an ISIRx packet. Note that the only error-free long packets are placed in theISIRxBuffEntry0. Both ping and ACKs are consumed in the ISI. CPU accessto ISIRxBuffEntry0 is provided for observability only i.e. CPU reads ofthe ISIRxBuffEntry0 do not alter the state of the buffer. 0x390 - 0x3ACISIRxBuffEntry1[7:0] 32x8  n/a ISI receive Buff, packet entry #1. As perISIRxBuffEntry0. 0x3B0 ISISubId0Seq 1 0x00000000 ISI sub ID 0 sequencebit register. 0x3B4 ISISubId1Seq 1 0x00000000 ISI sub ID 1 sequence bitregister. 0x3B8 ISISubIdSeqMask 2 0x00000000 ISI sub ID sequence bitmask register. 0x3BC ISINumPins 1 0x00000000 ISI number of pinsregister. 0x3C0 ISITurnAround 4 0x0000000F ISI bus turn around register.0x3C4 ISITShortReplyWin 5 0x0000001F ISI short packet reply window.0x3C8 ISITLongReplyWin 9 0x000001FF ISI long packet reply window. 0x3CCISIDebug 4 0x00000000 ISI debug register.

[1682] A detailed description of each register format follows. The CPUhas full read access to all registers. Write access to the fields ofeach register is defined as:

[1683] Full: The CPU has full write access to the field, i.e. the CPUcan write a 1 or a 0 to each bit.

[1684] Clear: The CPU can clear the field by writing a 1 to each bit.Writing a 0 to this type of field will have no effect.

[1685] None: The CPU has no write access to the field, i.e. a CPU writewill have no effect on the field.

[1686] 12.5.5.2.1 SCBResetN TABLE 39 SCBResetN register format Fieldwrite Name Bit(s) access Description CTRL 0 Full scb_ctrl sub-blockreset. Setting this field will reset the SCB control sub-block logic,including all configuration registers. 0 = reset 1 = default state ISI 1Full scb_isi sub-block reset. Setting this field will reset the ISIsub-block logic. 0 = reset 1 = default state USBH 2 Full scb_usbhsub-block reset. Setting this field will reset the USB host controllercore and associated logic. 0 = reset 1 = default state USBD 3 Fullscb_usbd sub-block reset. Setting this field will reset the USB devicecontroller core and associated logic. 0 = reset 1 = default state

[1687] 12.5.5.2.2 SCBGo TABLE 40 SCBGo register format Field Name Bit(s)write access Description CTRL 0 Full scb_ctrl sub-block go. 0 = halted 1= running ISI 1 Full scb_isi sub-block go. 0 = halted 1 = running

[1688] 12.5.5.2.3 SCBWakeUpEn

[1689] This register is used to gate the propagation of the USB and ISIreset signals to the CPR block. TABLE 41 SCBWakeUpEn register formatField Name Bit(s) write access Description USBWakeUpEn 0 Fullusb_cpr_reset_n propagation enable. 1 = enable 0 = disable ISIWakeUpEn 1Full isi_cpr_reset_n propagation enable. 1 = enable 0 = disable

[1690] 12.5.5.2.4 SCBISITxBufferArb

[1691] This register determines which source has priority at theISITxBuffer interface on the ISI block. When a bit is set priority isgiven to the relevant source. When both bits have the same value,arbitration will be performed in a round-robin manner. TABLE 42SCBISITxBufferArb register format Field Name Bit(s) write accessDescription CPUPriority 0 Full CPU priority 1 = high priority 0 = lowpriority USBPriority 1 Full USB priority 1 = high priority 0 = lowpriority

[1692] 12.5.5.2.5 SCBDebugSel

[1693] Contains address of the register selected for debug observationas it would appear on cpu_adr. The contents of the selected register areoutput in the scb_cpu_data bus while cpu_scb_sel is low andscb_cpu_debug_valid is asserted to indicate the debug data is valid. Itis expected that a number of pseudo-registers will be made available fordebug observation and these will be outlined with the implementationdetails. TABLE 43 SCBDebugSel register format Field Name Bit(s) writeaccess Description CPUAdr 11:2 Full cpu_adr register address.

[1694] 12.5.5.2.6 USBEPnDest

[1695] This register description applies to USBEP0Dest, USBEP1Dest,USBEP2Dest, USBEP3Dest, USBEP4Dest. The SCB has two routing options foreach packet received, based on the DestISIId associated with the packetssource EP:

[1696] To the DMA Manager

[1697] To the ISI

[1698] The SCB map therefore does not need special fields to identifythe DMAChannels on the ISIMaster SoPEC as this is taken care of by theSCB hardware. Thus the USBEP0Dest and USBEP1Dest registers should beprogrammed with 0x20 and 0x21 (for ISI0.0 and ISI0.1) respectively toensure data arriving on these endpoints is moved directly to DRAM. TABLE44 USBEPnDest register format Field Name Bit(s) Write access DescriptionSequenceBit 0 Full Sequence bit for packets going from USBEPn toDestISIId.DestISISubId. Every CPU write to this register initialises thevalue of the sequence bit and this is subsequently updated by the ISIafter every successful long packet transmission. DestISIId 4:1 FullDestination ISI ID. Denotes the ISIId of the targetSoPEC as per TableDestISISubId 5 Full Destination ISI sub ID. Indicates which DMAChannelof the target SoPEC the endpoint is mapped onto: 0 = DMAChannel0 1 =DMAChannel1 ChannelEn 6 Full Communication channel enable bit for EPn.This enables/disables the communication channel for EPn. When disabled,the SCB will not accept USB packets adressed to EPn. 0 = Channeldisabled 1 = Channel enabled

[1699] If the local SoPEC is connected to an external USB host, it isrecommended that the EP0 communication channel should always remainenabled and mapped to DMAChannel0 on the local SoPEC, as this isintended as the primary control communication channel between theexternal USB host and the local SoPEC.

[1700] A SoPEC ISIMaster should map as many USB endpoints, under thecontrol of the external host, as are required for the multi-SoPEC systemit is part of. As already mentioned this mapping may be dynamicallyreconfigured.

[1701] 12.5.5.2.7 DMAnBottomAdr

[1702] This register description applies to DMA0BottomAdr andDMA1BottomAdr. TABLE 45 DMAnBottomAdr register format Field Name Bit(s)Write access Description DMAnBottomAdr 21:5 Full The 256-bit alignedDRAM address of the bottom of the circular buffer (inclusive) servicedby DMAChanneln

[1703] 12.5.5.2.8 DMAnTopAdr

[1704] This register description applies to DMA0TopAdr and DMA 1 TopAdr.TABLE 46 DMAnTopAdr register format Field Name Bit(s) Write accessDescription DMAnTopAdr 21:5 Full The 256-bit aligned DRAM address of thetop of the circular buffer (inclusive) serviced by DMAChanneln

[1705] 12.5.5.2.9 DMAnCurrWPtr

[1706] This register description applies to DMA0CurrWPtr andDMA1CurrWPtr. TABLE 47 DMAnCurrWptr register format Field Name Bit(s)Write access Description DMAnCurrWPtr 21:5 Full The 256-bit aligned DRAMaddress of the next location DMAChannel0 will write to. This register isset by the CPU at the start of a DMA operation and dynamically updatedby the DMA manager during the operation.

[1707] 12.5.5.2.10 DMAIntAdr

[1708] This register description applies to DMA0IntAdr and DMA1IntAdr.TABLE 48 DMAnIntAdr register format Bit(s) Write access DescriptionDMAnIntAdr 21:5 Full The 256-bit aligned DRAM address of the locationthat will trigger an interrupt when reached by DMAChanneln buffer.

[1709] 12.5.5.2.11 DMAnMaxAdr

[1710] This register description applies to DMA0MaxAdr and DMA1MaxAdr.TABLE 49 DMAnMaxAdr register format Write Field Name Bit(s) accessDescription DMAnMaxAdr 21:5 Full The 256-bit aligned DRAM address of thelast free location that in the DMAChanneln circular buffer. DMAChannel0transfers will stop when it reaches this address.

[1711] 12.5.5.2.12 DMAAccessEn

[1712] This register enables DMA access for the various requesters, on aper channel basis. TABLE 50 DMAAccessEn register format Write Field NameBit(s) access Description DMAChannel0En 0 Full DMA Channel #0 accessenable. This uni-directional write channel is used by the USBD and theISI. 1 = enable 0 = disable DMAChannel1En 1 Full As per USBDISI0En.DMAChannel2En 2 Full DMA Channel #2 access enable. This bi-directionalread/write channel is used by the USBH. 1 = enable 0 = disable

[1713] 12.5.5.2.13 DMAStatus

[1714] The status bits are not sticky bits i.e. they reflect the ‘live’status of the channel. DMAChannelNntAdrHit and DMAChannelNMaxAdrHitstatus bits may only be cleared by writing to the relevant DMAnintAdr orDMAnMaxAdr register. TABLE 51 DMAStatus register format Write Field NameBit(s) access Description DMAChannel0IntAdrHit 0 None DMA channel #0interrupt address hit. 1 = DMAChannel0 has reached the address containedin the DMA0IntAdr register. 0 = default state DMAChannel0MaxAdrHit 1None DMA channel #0 max address hit. 1 = DMAChannel0 has reached theaddress contained in the DMA0MaxAdr register. 0 = default stateDMAChannel1 IntAdrHit 3 None As per DMAChannel0IntAdrHit. DMAChannel1MaxAdrHit 4 None As per DMAChannel0MaxAdrHit.

[1715] 12.5.5.2.14 DMAMask Register

[1716] All bits of the DMAMask are both readable and writable by theCPU. The DMA manager cannot alter the value of this register. Allinterrupts are generated in an edge sensitive manner i.e. the DMAmanager will generate a dma_icu_irq pulse each time a status bit goeshigh and its corresponding mask bit is enabled. TABLE 52 DMAMaskregister format Write Field Name Bit(s) access DescriptionDMAChannel0IntAdrHitIntEn 0 Full DMAChannel0IntAdrHit status interruptenable. 1 = enable 0 = disable DMAChannel0MaxAdrHitIntEn 1 FullDMAChannel0MaxAdrHit status interrupt enable. 1 = enable 0 = disableDMAChannel1IntAdrHitIntEn 2 Full As per DMAChannel0IntAdrHitIntEnDMAChannel1MaxAdrHitIntEn 3 Full As per DMAChannel0MaxAdrHitIntEn

[1717] 12.5.5.2.15 CPUISITxBuffCtrl Register TABLE 53 CPUISITxBuffCtrlregister format Write Field Name Bit(s) access Description PktValid 0full This field should be set by the CPU to indicate the validity of theCPUISITxBuff contents. This field will be cleared by the SCB once thecontents of the CPUISITxBuff has been copied to the ISITxBuff. NOTE: TheCPU should not clear this field under normal operation. If the CPUclears this field during a packet transfer to the ISITxBuff, thetransfer will be aborted - this is not recommended. 1 = valid packet. 0= default state. PktDesc 3:1 full PktDesc field, as per Table, of thepacket contained in the CPUISITxBuff. The CPU is responsible formaintaining the correct sequence bit value for each ISIId.ISISubIdchannel it communicates with. Only valid when CPU-ISITxBuffCtrl.PktValid = 1. DestISIId 7:4 full Denotes the ISIId of thetarget SoPEC as per Table. DestISISubId 8 full Indicates whichDMAChannel of the target SoPEC the packet in the CPUISITxBuff isdestined for. 1 = DMAChannel1 0 = DMAChannel0

[1718] 12.5.5.2.16 USBDIntStatus

[1719] The USBDIntStatus register contains status bits that are relatedto conditions that can cause an interrupt to the CPU, if thecorresponding interrupt enable bits are set in the USBDMask register.The field name extension Sticky implies that the status condition willremain registered until cleared by a CPU write of 1 to each bit of thefield.

[1720] NOTE: There is no Ep0IrregPktSticky field because the defaultcontrol EP will frequently receive packets that are not multiples of 32bytes during normal operation. TABLE 54 USBDIntStatus register formatWrite Field Name Bit(s) access Description CoreSuspendSticky 0 ClearDevice core USB suspend flag. Sticky. 1 = USB suspend state. Set whendevice core udcvci_suspend signal transitions from 1 −> 0. 0 = defaultvalue. CoreUSBResetSticky 1 Clear Device core USB reset flag. Sticky. 1= USB reset. Set when device core udcvci_reset signal transitions from 1−> 0. 0 = default value. CoreUSBSOFSticky 2 Clear Device core USB StartOf Frame (SOF) flag. Sticky. 1 = USB SOF. Set when device coreudcvci_sof signal transitions from 1 −> 0 0 = default value.CPUISITxBuffEmptySticky 3 Clear CPU ISI transmit buffer empty flag.Sticky. 1 = empty. 0 = default value. CPUEp0InBuffEmptySticky 4 ClearCPU EP0 IN buffer empty flag. Sticky. 1 = empty. 0 = default value.CPUEp5InBuffEmptySticky 5 Clear CPU EP5 IN buffer empty flag. Sticky. 1= empty. 0 = default value. Ep0InNAKSticky 6 clear EP0-IN NAK flag.Sticky This flag is set if the USB device core issues a read request forEP0-IN and there is not a valid packet present in the EP0-IN buffer. Thecore will therefore send a NAK response to the IN token that wasreceived from external USB host. This is an indicator of any back-pressure on the USB caused by EP0-IN. 1 = NAK sent. 0 = default valueEp5InNAKSticky 7 Clear As per Ep0InNAK. Ep0OutNAKSticky 8 Clear EP0-OUTNAK flag. Sticky This flag is set if the USB device core issues a writerequest for EP0-OUT and there is no space in the OUT EP buffer for a thepacket. The core will therefore send a NAK response to the OUT tokenthat was received from external USB host. This is an indicator of anyback-pressure on the USB caused by EP0- OUT. 1 = NAK sent. 0 = defaultvalue Ep1OutNAKSticky 9 Clear As per EP0OutNAK. Ep2OutNAKSticky 10 ClearAs per EP0OutNAK. Ep3OutNAKSticky 11 Clear As per EP0OutNAK.Ep4OutNAKSticky 12 Clear As per EP0OutNAK. Ep1IrregPktSticky 13 ClearEP1-OUT irregular sized packet flag. Sticky. Indicates a packet that isnot a multiple of 32 bytes in size was received by EP1-OUT. 1 =irregular sized packet received. 0 = default value. Ep2IrregPktSticky 14Clear As per Ep1IrregPktSticky. Ep3IrregPktSticky 15 Clear As perEp1IrregPktSticky. Ep4IrregPktSticky 16 Clear As per Ep1IrregPktSticky.OutBuffOverFlowSticky 17 Clear OUT EP buffer overflow flag. Sticky. Thisflag is set if the USB device core attempted to write a packet of morethan 64 bytes to the OUT EP buffer. This is a fatal error, suggesting aproblem in the USB device IP core. The SCB will take no further action.1 = overflow condition detected. 0 = default value. InBuffUnderRunSticky18 clear IN EP buffer underrun flag. Sticky. This flag is set if the USBdevice core attempted to read more data than was present from the IN EPbuffer. This is a fatal error, suggesting a problem in the USB device IPcore. The SCB will take no further action. 1 = underrun conditiondetected. 0 = default value.

[1721] 12.5.5.2.17 USBDISIFIFOStatus

[1722] This register contains the status of the ISI mapped OUT EP packetFIFO. This is a secondary status register and will not cause anyinterrupts to the CPU. TABLE 55 USBDISIFIFOStatus register format WriteField Name Bit(s) access Description Entry0Valid  0 none FIFO entry #0valid field. This flag will be set by the USBD when the USB device coreindicates the validity of packet entry #0 in the FIFO. 1 = valid USBpacket in ISI OUT EP buffer 0. 0 = default value. Entry0Source  3:1 noneFIFO entry #0 source field. Contains the EP associated with packet entry#0 in the FIFO. Binary Coded Decimal. Only valid when ISIBuff0PktValid= 1. Entry1Valid  4 none As per Entry0Valid. Entry1Source  7:5 none Asper Entry0Source. Entry2Valid  8 none As per Entry0Valid. Entry2Source11:9 none As per Entry0Source. Entry3Valid 12 none As per Entry0Valid.Entry3Source 15:13 none As per Entry0Source.

[1723] 12.5.5.2.18 USBDDMAOFIFOStatus

[1724] This register description applies to USBDDMAOFIFOStatus andUSBDDMA1FIFOStatus. This register contains the status of the DMAChannelNmapped OUT EP packet FIFO. This is a secondary status register and willnot cause any interrupts to the CPU. TABLE 56 USBDDMANFIFOStatusregister format Write Field Name Bit(s) access Description Entry0Valid 0none FIFO entry #0 valid field. This flag will be set by the USBD whenthe USB device core indicates the validity of packet entry #0 in theFIFO. 1 = valid USB packet in ISI OUT EP buffer 0. 0 = default value.Entry0Source 3:1 none FIFO entry #0 source field. Contains the EPassociated with packet entry #0 in the FIFO. Binary Coded Decimal. Onlyvalid when Entry0Valid = 1. Entry1Valid 4 none As per Entry0Valid.Entry1Source 7:5 none As per Entry0Source.

[1725] 12.5.5.2.19 USBDResume

[1726] This register causes the USB device core to initiate resumesignalling to the external USB host. Only applicable when the devicecore is in the suspend state. TABLE 57 USBDResume register format WriteField Name Bit(s) access Description USBDResume 0 full USBD core resumeregister. The USBD will clear this register upon resume notificationfrom the device core. 1 = generate resume signalling. 0 = default value.

[1727] 12.5.5.2.20 USBDSetup

[1728] This register controls the general setup/configuration of theUSBD. TABLE 58 USBDSetup register format write Field Name Bit(s) accessDescription Ep1IrregPktCntrl 0 full EP 1 OUT irregular sized packetcontrol. An irregular sized packet is defined as a packet that is not amultiple of 32 bytes. 1 = discard irregular sized packets. 0 = read 32bytes from buffer, regardless of packet size. Ep2IrregPktCntrl 1 full Asper Ep1IrregPktDiscard Ep3IrregPktCntrl 2 full As per Ep1IrregPktDiscardEp4IrregPktCntrl 3 full As per Ep1IrregPktDiscard

[1729] 12.5.5.2.21 USBDEpNInBuffCtrl Register

[1730] This register description applies to USBDEp0InBuffCtrl andUSBDEp5InBuffCtrl. TABLE 59 USBDEpNInBuffCtrl register format WriteField Name Bit(s) access Description PktValid 0 full Setting thisregister validates the contents of USBDEpNInBuff. This field will becleared by the SCB once the packet has been successfully transmitted tothe external USB host. NOTE: The CPU should not clear this field undernormal operation. If the CPU clears this field during a packet transferto the USB, the transfer will be aborted - this is not recommended. 1 =valid packet. 0 = default state.

[1731] 12.5.5.2.22 USBDMask

[1732] This register serves as an interrupt mask for all USBD statusconditions that can cause a CPU interrupt. Setting a field enablesinterrupt generation for the associated status event. Clearing a fielddisables interrupt generation for the associated status event. Allinterrupts will be generated in an edge sensitive manner, i.e. when theassociated status register transitions from 0->1. TABLE 60 USBDMaskregister format Write Field Name Bit(s) access DescriptionCoreSuspendStickyEn 0 full CoreSuspendSticky status interrupt enable.CoreUSBResetStickyEn 1 full CoreUSBResetSticky status interrupt enable.CoreUSBSOFStickyEn 2 full CoreUSBSOFSticky status interrupt enable.CPUISITxBuffEmptyStickyEn 3 full CPUISITxBuffEmptySticky statusinterrupt enable. CPUEp0InBuffEmptyStickyEn 4 fullCPUEp0InBuffEmptySticky status interrupt enable.CPUEp5InBuffEmptyStickyEn 5 full CPUEp5InBuffEmptySticky statusinterrupt enable. Ep0InNAKStickyEn 6 full Ep0InNAKSticky statusinterrupt enable. Ep5InNAKStickyEn 7 full Ep5InNAKSticky statusinterrupt enable. Ep0OutNAKStickyEn 8 full Ep0OutNAKSticky statusinterrupt enable. Ep1OutNAKStickyEn 9 full Ep1OutNAKSticky statusinterrupt enable. Ep2OutNAKStickyEn 10 full Ep2OutNAKSticky statusinterrupt enable. Ep3OutNAKStickyEn 11 full Ep3OutNAKSticky statusinterrupt enable. Ep4OutNAKStickyEn 12 full Ep4OutNAKSticky statusinterrupt enable. Ep1IrregPktStickyEn 13 full Ep1IrregPktSticky statusinterrupt enable. Ep2IrregPktStickyEn 14 full Ep2IrregPktSticky statusinterrupt enable. Ep3IrregPktStickyEn 15 full Ep3IrregPktSticky statusinterrupt enable. Ep4IrregPktStickyEn 16 full Ep4IrregPktSticky statusinterrupt enable. OutBuffOverFlowStickyEn 17 full OutBuffOverFlowStickystatus interrupt enable. InBuffUnderRunStickyEn 18 fullInBuffUnderRunSticky status interrupt enable.

[1733] 12.5.5.2.23 USBDDebug

[1734] This register is intended for debug purposes only. Containsnon-sticky versions of all interrupt capable status bits, which arereferred to as dynamic in the table. TABLE 61 USBDDebug register formatwrite Field Name Bit(s) access Description CoreTimeStamp 10:0 none USBdevice core frame number. CoreSuspend 11 none Dynamic version ofCoreSuspendSticky. CoreUSBReset 12 none Dynamic version ofCoreUSBResetSticky. CoreUSBSOF 13 none Dynamic version ofCoreUSBSOFSticky. CPUISITxBuffEmpty 14 none Dynamic version ofCPUISITxBuffEmptySticky. CPUEp0InBuffEmpty 15 none Dynamic version ofCPUEp0InBuffEmptySticky. CPUEp5InBuffEmpty 16 none Dynamic version ofCPUEp5InBuffEmptySticky. Ep0InNAK 17 none Dynamic version ofEp0InNAKSticky. Ep5InNAK 18 none Dynamic version of Ep5InNAKSticky.Ep0OutNAK 19 none Dynamic version of Ep0OutNAKSticky. Ep1OutNAK 20 noneDynamic version of Ep1OutNAKSticky. Ep2OutNAK 21 none Dynamic version ofEp2OutNAKSticky. Ep3OutNAK 22 none Dynamic version of Ep3OutNAKSticky.Ep4OutNAK 23 none Dynamic version of Ep4OutNAKSticky. Ep1IrregPkt 24none Dynamic version of Ep1IrregPktSticky. Ep2IrregPkt 25 none Dynamicversion of Ep2IrregPktSticky. Ep3IrregPkt 26 none Dynamic version ofEp3IrregPktSticky. Ep4IrregPkt 27 none Dynamic version ofEp4IrregPktSticky. OutBuffOverFlow 28 none Dynamic version ofOutBuffOverFlowSticky. InBuffUnderRun 29 none Dynamic version ofInBuffUnderRunSticky.

[1735] 12.5.5.2.24 USBHStatus

[1736] This register contains all status bits associated with the USBH.The field name extension Sticky implies that the status condition willremain registered until cleared by a CPU write. TABLE 62 USBHStatusregister format Write Field Name Bit(s) access Description CoreIRQSticky0 clear HC core IRQ interrupt flag. Sticky Set when HC core UHOSTC_IrqNoutput signal transitions from 0 −> 1. Refer to OHCI spec for details onHC interrupt processing. 1 = IRQ interrupt from core. 0 = default value.CoreSMISticky 1 clear HC core SMI interrupt flag. Sticky Set when HCcore UHOSTC_SmiN output signal transi- tions from 0 −> 1. Refer to OHCIspec for details on HC interrupt processing. 1 = SMI interrupt from HC.0 = default value. CoreBuffAcc 2 none HC core buffer access flag. HCcore UHOSTC_BufAcc output signal. Indicates whether the HC is accessinga descriptor or a buffer in shared system memory. 1 = buffer access 0 =descriptor access.

[1737] 12.5.5.2.25 USBHMask

[1738] This register serves as an interrupt mask for all USBH statusconditions that can cause a CPU interrupt. All interrupts will begenerated in an edge sensitive manner, i.e. when the associated statusregister transitions from 0->1. TABLE 63 USBHMask register format WriteField Name Bit(s) access Description CoreIRQIntEn 0 full CoreIRQStickystatus interrupt enable. 1 = enable. 0 = disable. CoreSMIIntEn 1 fullCoreSMISticky status interrupt enable. 1 = enable. 0 = disable.

[1739] 12.5.5.2.26 USBHDebug

[1740] This register is intended for debug purposes only. Containsnon-sticky versions of all interrupt capable status bits, which arereferred to as dynamic in the table. TABLE 64 USBHDebug register formatwrite Field Name Bit(s) access Description CoreIRQ 0 none Dynamicversion of CoreIRQSticky. CoreSMI 1 None Dynamic version ofCoreSMISticky.

[1741] 12.5.5.2.27 ISICntrl

[1742] This register controls the general setup/configuration of theISI.

[1743] Note that the reset value of this register allows the SoPEC toautomatically become an ISIMaster (AutoMasterEnable=1) if any USBpackets are received on endpoints 24. On becoming an ISIMaster theISIMasterSel bit is set and any USB or CPU packets destined for otherISI devices are transmitted. The CPU can override this capability at anytime by clearing the AutoMasterEnable bit. TABLE 65 ISICntrl registerformat Write Field Name Bit(s) access Description TxEnable 0 Full ISItransmit enable. Enables ISI transmission of long or ping packets. ACKsmay still be transmitted when this bit is 0. This is cleared by transmiterrors and needs to be restarted by the CPU. 1 = Transmission enabled 0= Transmission disabled RxEnable 1 Full ISI receive enable. Enables ISIreception. This is can only be cleared by the CPU and it is onlyanticipated that reception will be disabled when the ISI in not in useand the ISI pins are being used by the GPIO for another purpose. 1 =Reception enabled 0 = Reception disabled ISIMasterSel 2 Full ISI masterselect. Determines whether the SoPEC is an ISIMaster or not 1 =ISIMaster 0 = ISISlave AutoMasterEnable 3 Full SI auto master enable.Enables the device to automatically become the ISIMaster if activity isdetected on USB endpoints2-4. 1 = auto-master operation enabled 0 =auto-master operation disabled

[1744] 12.5.5.2.28 ISIId TABLE 66 ISIId register format Field Write NameBit(s) access Description ISIId 3:0 Full ISIId for this SoPEC. SoPECresets to being an ISISlave with ISIId0. 0xF (the broadcast ISIId) is anillegal value and should not be written to this register.

[1745] 12.5.5.2.29 ISINumRetries TABLE 67 ISINumRetries register formatWrite Field Name Bit(s) access Description ISINumRetries 3:0 Full Numberof ISI retransmissions to attempt in response to an inferred NAK beforeaborting a long packet transmission

[1746] 12.5.5.2.30 ISIPingScheduleN

[1747] This register description applies to ISIPingSchedule0,ISIPingSchedule1 and ISIPingSchedule2. TABLE 68 ISIPingScheduleNregister format Write Field Name Bit(s) access DescriptionISIPingSchedule 14:0 Full Denotes which ISIIds will be receive pingpackets. Note that bit0 refers to ISIId0, bit1 to ISIId1 . . . bit14 toISIId14.

[1748] 12.5.5.2.31 ISITotalPeriod TABLE 69 ISITotalPeriod registerformat Write Field Name Bit(s) access Description ISITotalPeriod 3:0Full Reload value of the ISITotalPeriod counter

[1749] 12.5.5.2.32 ISILocalPeriod TABLE 70 ISILocalPeriod registerformat Write Field Name Bit(s) access Description ISILocalPeriod 3:0Full Reload value of the ISILocalPeriod counter

[1750] 12.5.5.2.33 ISIIntStatus

[1751] The ISIIntStatus register contains status bits that are relatedto conditions that can cause an interrupt to the CPU, if thecorresponding interrupt enable bits are set in the ISIMask register.TABLE 71 ISIIntStatus register Write Field Name Bit(s) accessDescription TxErrorSticky 0 None SI transmit error flag. Sticky.Receiving ISI device would not accept the transmitted packet. Only setafter NumRetries unsuccessful retransmissions, (excluding ping packets).This bit is cleared by the ISI after transmission has been re-enabled bythe CPU setting the TxEnable bit of the ISICntrl register. 1 = transmiterror. 0 = default state. RxFrameErrorSticky 1 Clear ISI receive framingerror flag. Sticky. This bit is set by the ISI when a framing errordetect- ed in the received packet, which can be caused by an incorrectStart or Stop field or by bit stuffing errors. 1 = framing errordetected. 0 = default state. RxCRCErrorSticky 2 Clear ISI receive CRCerror flag. This bit is set by the ISI when a CRC error is detected inan incoming packet. Other than dropping the errored packet ISI receptionis un- affected by a CRC Error. 1 = CRC error 0 = default state.RxBuffOverFlowSticky 3 Clear ISI receive buffer over flow flag. Sticky.An overflow has occurred in the ISI receive buffer and a packet had tobe dropped. 1 = over flow condition detected. 0 = default state.

[1752] 12.5.5.2.34 ISITxBuffStatus

[1753] The ISITxBuffStatus register contains status bits that arerelated to the ISI Tx buffer. This is a secondary status register andwill not cause any interrupts to the CPU. TABLE 72 ISITxBuffStatusregister format Write Field Name Bit(s) access DescriptionEntry0PktValid 0 None ISI Tx buffer entry #0 packet valid flag. Thisflag will be set by the ISI when a valid ISI packet is written to entry#0 in the ISITxBuff for transmission over the ISI bus. A Tx packet isconsidered valid when it is 32 bytes in size and the ISI has written thepacket header information to Entry0PktDesc, Entry0DestISIId andEntry0DestISISubId. 1 = packet valid. 0 = default value. Entry0PktDesc3:1 None ISI Tx buffer entry #0 packet descriptor. PktDesc field as perTable for the packet entry #0 in the ISITxBuff. Only valid whenEntry0PktValid = 1. Entry0DestISIId 7:4 None ISI Tx buffer entry #0destination ISI ID. Denotes the ISIId of the target SoPEC as per Table .Only valid when Entry0PktValid = 1. Entry0DestISISubId 8 None ISI Txbuffer entry #0 destination ISI sub ID. Indicates which DMAChannel onthe target SoPEC that packet entry #0 in the ISITxBuff is destined for.Only valid when Entry0PktValid = 1. 1 = DMAChannel1 0 = DMAChannel0Entry1PktValid 9 None As per Entry0PktValid. Entry1PktDesc 12:10 None Asper Entry0PktDesc. Entry1DestISIId 16:13 None As per Entry0DestISIId.Entry1DestISISubId 17 None As per Entry0DestISISubId.

[1754] 12.5.5.2.35 ISIRxBuffStatus

[1755] The ISIRxBuffStatus register contains status bits that arerelated to the ISI Rx buffer. This is a secondary status register andwill not cause any interrupts to the CPU. TABLE 73 ISIRxBuffStatusregister format Write Field Name Bit(s) access DescriptionEntry0PktValid 0 None ISI Rx buffer entry #0 packet valid flag. Thisflag will be set by the ISI when a valid ISI packet is received andwritten to entry #0 of the ISIRxBuff. A Rx packet is considered validwhen it is 32 bytes in size and no framing or CRC errors were detected.1 = valid packet 0 = default value Entry0PktDesc 3:1 None ISI Rx bufferentry #0 packet descriptor. PktDesc field as per Table for packet entry#0 of the ISIRxBuff. Only valid when Entry0PktValid = 1. Entry0DestISIId7:4 None ISI Rx buffer 0 destination ISI ID. Denotes the ISIId of thetarget SoPEC as per Table . This should always corre- spond to the localSoPEC ISIId. Only valid when Entry0PktValid = 1. Entry0DestISISubId 8None ISI Rx buffer 0 destination ISI sub ID. Indicates which DMAChannelon the target SoPEC that entry #0 of the ISIRxBuff is destined for. Onlyvalid when Entry0PktValid = 1. 1 = DMAChannel1 0 = DMAChannel0Entry1PktValid 9 None As per Entry0PktValid. Entry1PktDesc 12:10 None Asper Entry0PktDesc. Entry1DestISIId 16:13 None As per Entry0DestISIId.Entry1DestISISubId 17 None As per Entry0DestISISubId.

[1756] 12.5.5.2.36 ISIMask Register

[1757] An interrupt will be generated in an edge sensitive manner i.e.the ISI will generate an isi_icu_irq pulse each time a status bit goeshigh and the corresponding bit of the ISIMask register is enabled. TABLE74 ISIMask register Write Field Name Bit(s) access DescriptionTxErrorIntEn 0 Full TxErrorSticky status interrupt enable. 1 = enable. 0= disable. RxFrameErrorIntEn 1 Full RxFrameErrorSticky status interruptenable. 1 = enable. 0 = disable. RxCRCErrorIntEn 2 Full RxCRCErrorStickystatus interrupt enable. 1 = enable. 0 = disable. RxBuffOverFlowIntEn 3Full RxBuffOverFlowSticky status interrupt enable. 1 = enable. 0 =disable.

[1758] 12.5.5.2.37 ISISubIdNSeq

[1759] This register description applies to ISISubId0Seq andISISubId0Seq. TABLE 75 ISISubIdNSeq register format Write Field NameBit(s) access Description ISISubIdNSeq 0 Full ISI sub ID channel Nsequence bit. This bit may be initialised by the CPU but is updated bythe ISI each time an error-free long packet is received.

[1760] 12.5.5.2.38 ISISubIdSeqMask TABLE 76 ISISubIdSeqMask registerformat Write Field Name Bit(s) access Description ISISubIdSeq0Mask 0Full ISI sub ID channel 0 sequence bit mask. Setting this bit ensuresthat the sequence bit will be ignored for incoming packets for theISISubId. 1 = ignore sequence bit. 0 = default state. ISISubIdSeq1Mask 1Full As per ISISubIdSeq0Mask.

[1761] 12.5.5.2.39 ISINumPins TABLE 77 ISINumPins register format WriteField Name Bit(s) access Description ISINumPins 0 Full Select number ofactive ISI pins. 1 = 4 pins 0 = 2 pins

[1762] 12.5.5.2.40 ISITurnAround

[1763] The ISI bus turnaround time will reset to its maximum value of0xF to provide a safer starting mode for the ISI bus. This value shouldbe set to a value that is suitable for the physical implementation ofthe ISI bus, i.e. the lowest turn around time that the physicalimplementation will allow without significant degradation of signalintegrity. TABLE 78 ISITurnAround register format Write Field NameBit(s) access Description ISITurnAround 3:0 Full ISI bus turn aroundtime in ISI clock cycles (32 MHz).

[1764] 12.5.5.2.41 ISIShortReplyWin

[1765] The ISI short packet reply window time will reset to its maximumvalue of 0x1F to provide a safer starting mode for the ISI bus. Thisvalue should be set to a value that will allow for expected frequency ofbit stuffing and receiver response timing. TABLE 79 ISIShortReplyWinregister format Write Field Name Bit(s) access DescriptionISIShortReplyWin 4:0 Full ISI long packet reply window in ISI clockcycles (32 MHz).

[1766] 12.5.5.2.42 ISILongReplyWin

[1767] The ISI long packet reply window time will reset to its maximumvalue of 0x1FF to provide a safer starting mode for the ISI bus. Thisvalue should be set to a value that will allow for expected frequency ofbit stuffing and receiver response timing. TABLE 80 ISILongReplyWinregister format Write Field Name Bit(s) access DescriptionISILongReplyWin 8:0 Full ISI long packet reply window in ISI clockcycles (32 MHz).

[1768] 12.5.5.2.43 ISIDebug

[1769] This register is intended for debug purposes only. Containsnon-sticky versions of all interrupt capable status bits, which arereferred to as dynamic in the table. TABLE 81 ISIDebug register formatWrite Field Name Bit(s) access Description TxError 0 None Dynamicversion of TxErrorSticky. RxFrameError 1 None Dynamic version ofRxFrameErrorSticky. RxCRCError 2 None Dynamic version ofRxCRCErrorSticky. RxBuffOverFlow 3 None Dynamic version ofRxBuffOverFlowSticky.

[1770] 12.5.5.3 CPU Bus Interface

[1771] 12.5.5.4 Control Core Logic

[1772] 12.5.5.5 DIU Bus Interface

[1773] 12.6 DMA Regs

[1774] All of the circular buffer registers are 256-bit word aligned asrequired by the DIU. The DMAnBottomAdr and DMAnTopAdr registers areinclusive i.e. the addresses contained in those registers form part ofthe circular buffer. The DMAnCurrWPtr always points to the next locationthe DMA manager will write to so interrupts are generated whenever theDMA manager reaches the address in either the DMAnIntAdr or DMAnMaxAdrregisters rather than when it actually writes to these locations. Ittherefore can not write to the location in the DMAnMaxAdr register.

[1775] SCB Map Regs

[1776] The SCB map is configured by mapping a USB endpoint on to a datasink. This is performed on a endpoint basis i.e. each endpoint has aconfiguration register to allow its data sink be selected. Mapping anendpoint on to a data sink does not initiate any data flow—eachendpoint/data sink needs to be enabled by writing to the appropriateconfiguration registers for the USBD, ISI and DMA manager.

[1777] 13. General Purpose IO (GPIO)

[1778] 13.1 Overview

[1779] The General Purpose IO block (GPIO) is responsible for controland interfacing of GPIO pins to the rest of the SoPEC system. Itprovides easily programmable control logic to simplify control of GPIOfunctions. In all there are 32 GPIO pins of which any pin can assume anyoutput or input function. Possible output functions are

[1780] 4 Stepper Motor control Outputs

[1781] 12 Brushless DC Motor Control Output (total of 2 differentcontrollers each with 6 outputs)

[1782] 4 General purpose high drive pulsed outputs capable of drivingLEDs.

[1783] 4 Open drain IOs used for LSS interfaces

[1784] 4 Normal drive low impedance IOs used for the ISI interface inMulti-SoPEC mode

[1785] Each of the pins can be configured in either input or outputmode, each pin is independently controlled. A programmable de-glitchingcircuit exists for a fixed number of input pins. Each input is a schmidttrigger to increase noise immunity should the input be used without thede-glitch circuit. The mapping of the above functions and theiralternate use in a slave SoPEC to GPIO pins is shown in Table 82 below.TABLE 82 GPIO pin type GPIO pin(s) Pin IO Type Default Functiongpio[3:0] Normal drive, low impedance IO Pins 1 and 0 in ISI (35 Ohm),Integrated pull-up Mode, pins 2 and 3 resistor in input mode gpio[7:4]High drive, normal impedance IO Input Mode (65 Ohm), intended for LEDdrivers gpio[31:8] Normal drive, normal impedance IO Input Mode (65Ohm), no pull-up

[1786] 13.2 Stepper Motor Control

[1787] The motor control pins can be directly controlled by the CPU orthe motor control logic can be used to generate the phase pulses for thestepper motors. The controller consists of two central counters fromwhich the control pins are derived. The central counters have severalregisters (see Table) used to configure the cycle period, the phase, theduty cycle, and counter granularity.

[1788] There are two motor master counters (0 and 1) with identicalfeatures. The period of the master counters are defined by theMotorMasterClkPeriod[1:0] and MotorMasterClkSrc registers i.e. bothmaster counters are derived from the same MotorMasterClkSrc. TheMotorMasterClkSrc defines the timing pulses used by the master countersto determine the timing period. The MotorMasterClkSrc can select clocksources of 1 μs, 100 μs, 10 ms and pclk timing pulses.

[1789] The MotorMasterClkPeriod[1:0] registers are set to the number oftiming pulses required before the timing period re-starts. Each mastercounter is set to the relevant MotorMasterClkPeriod value and countsdown a unit each time a timing pulse is received.

[1790] The master counters reset to MotorMasterClkPeriod value and countdown. Once the value hits zero a new value is reloaded from theMotorMasterClkPeriod[1:0] registers. This ensures that no master clockglitch is generated when changing the clock period.

[1791] Each of the IO pins for the motor controller are derived from themaster counters. Each pin has independent configuration registers. TheMotorMasterClkSelect[3:0] registers define which of the two mastercounters to use as the source for each motor control pin. The mastercounter value is compared with the configured MotorCtrlLow andMotorCtrlHigh registers (bit fields of the MotorCtrlConfig register). Ifthe count is equal to MotorCtrlHigh value the motor control is set to 1,if the count is equal to MotorCtrlLow value the motor control pin is setto 0.

[1792] This allows the phase and duty cycle of the motor control pins tobe varied at pclk granularity. The motor control generators keep aworking copy of the MotorCtrlLow, MotorCtrlHigh values and update theconfigured value to the working copy when it is safe to do so. Thisallows the phase or duty cycle of a motor control pin to be safelyadjusted by the CPU without causing a glitch on the output pin.

[1793] Note that when reprogramming the MotorCtrlLow, MotorCtrlHighregisters to reorder the sequence of the transition points (e.g changingfrom low point less than high point to low point greater than high pointand vice versa) care must still taken to avoid introducing glitching onthe output pin.

[1794] 13.3 LED Control

[1795] LED lifetime and brightness can be improved and power consumptionreduced by driving the LEDs with a pulsed rather than a DC signal. Thesource clock for each of the LED pins is a 7.8 kHz (128 μs period) clockgenerated from the 1 μs clock pulse from the Timers block. TheLEDDutySelect registers are used to create a signal with the desiredwaveform. Unpulsed operation of the LED pins can be achieved by usingCPU IO direct control, or setting LEDDutySelect to 0. By default the LEDpins are controlled by the LED control logic.

[1796] 13.4 LSS Interface via GPIO

[1797] In some SoPEC system configurations one or more of the LSSinterfaces may not be used. Unused LSS interface pins can be reused asgeneral IO pins by configuring the IOModeSelect registers. When a modeselect register for a particular GPIO pin is set to 23,22,21,20 the GPIOpin is connected to LSS control IOs 3 to 0 respectively.

[1798] 13.5 ISI Interface via GPIO

[1799] In Multi-SoPEC mode the SCB block (in particular the ISIsub-block) requires direct access to and from the GPIO pins. Control ofthe ISI interface pins is determined by the IOModeSelect registers. Whena mode select register for a particular GPIO pin is set to 27,26,25,24the GPIO pin connected to the ISI control bits 3 to 0 respectively. Bydefault the GPIO pins 1 to 0 are directly controlled by the ISI block.

[1800] In single SoPEC systems the pins can be re-used by the GPIO.

[1801] 13.6 CPU GPIO Control

[1802] The CPU can assume direct control of any (or all) of the IO pinsindividually. On a per pin basis the CPU can turn on direct access tothe pin by configuring the IOModeSelect register to CPU direct mode.Once set the IO pin assumes the direction specified by theCpuIODirection register. When in output mode the value in registerCpuIOOut will be directly reflected to the output driver. When in inputmode the status of the input pin can be read by reading CpuIOInregister. When writing to the CpuIOOut register the value being writtenis XORed with the current value in CpuIOOut. The CPU can also read thestatus of the 10 selected de-glitched inputs by reading theCpuIOInDeGlitch register.

[1803] 13.7 Programmable De-Glitching Logic

[1804] Each IO pin can be filtered through a de-glitching logic circuit,the pin that the de-glitching logic is connected to is configured by theInputPinSelect registers. There are IO de-glitching circuits, so amaximum of IO input pin can be de-glitched at anytime.

[1805] The de-glitch circuit can be configured to sample the IO pin fora predetermined time before concluding that a pin is in a particularstate. The exact sampling length is configurable, but each de-glitchcircuit must use one of two possible configured values (selected byDeGlitchSelect). The sampling length is the same for both high and lowstates. The DeGlitchCount is programmed to the number of system timeunits that a state must be valid for before the state is passed on. Thetime units are selected by DeGlitchClkSel and can be one of 1 μs, 100μs, 10 ms and pclk pulses.

[1806] For example if DeGlitchCount is set to 10 and DeGlitchClkSel setto 3, then the selected input pin must consistently retain its value for10 system clock cycles (pclk) before the input state will be propagatedfrom CpuIOIn to CpuIOInDeglitch.

[1807] 13.8 Interrupt Generation

[1808] Any of the selected input pins (selected by InputPinSelect) cangenerate an interrupt from the raw or deglitched version of the inputpin. There are IO possible interrupt sources from the GPIO to theinterrupt controller, one interrupt per input pin. TheInterruptSrcSelect register determines whether the raw input or thedeglitched version is used as the interrupt source.

[1809] The interrupt type, masking and priority can be programmed in theinterrupt controller.

[1810] 13.9 Frequency Analyser

[1811] The frequency analyser measures the duration between successivepositive edges on a selected input pin (selected by InputPinSelect) andreports the last period measured (FreqAnaLastPeriod) and a runningaverage period (FreqAnaAverage).

[1812] The running average is updated each time a new positive edge isdetected and is calculated byFreqAnaAverage=(FreqAnaAverage/8)*7+FreqAnaLastPeriod/8.

[1813] The analyser can be used with any selected input pin (or itsdeglitched form), but only one input at a time can be selected. Theinput is selected by the FreqAnaPinSelect (range of 0 to 9) and itsdeglitched form can be selected by FreqAnaPinFormSelect.

[1814] 13.10 Brushless DC (BLDC) Motor Controllers

[1815] The GPIO contains 2 brushless DC (BLDC) motor controllers. Eachcontroller consists of 3 hall inputs, a direction input, and sixpossible outputs. The outputs are derived from the input state and apulse width modulated (PWM) input from the Stepper Motor controller, andis given by the truth table in Table 83. TABLE 83 Truth Table for BLDCMotor Controllers direction hc hb ha q6 q5 q4 q3 q2 q1 0 0 0 1 0 0 0 1PWM 0 0 0 1 1 PWM 0 0 1 0 0 0 0 1 0 PWM 0 0 0 0 1 0 1 1 0 0 0 PWM 0 0 10 1 0 0 0 1 PWM 0 0 0 0 1 0 1 0 1 0 0 PWM 0 0 0 0 0 0 0 0 0 0 0 0 1 1 10 0 0 0 0 0 1 0 0 1 0 0 PWM 0 0 1 1 0 1 1 PWM 0 0 0 0 1 1 0 1 0 PWM 0 01 0 0 1 1 1 0 0 0 0 1 PWM 0 1 1 0 0 0 1 0 0 PWM 0 1 1 0 1 0 1 PWM 0 0 01 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0

[1816] All inputs to a BLDC controller must be de-glitched. Eachcontroller has its inputs hardwired to de-glitch circuits. Controller 1hall inputs are de-glitched by circuits 2 to 0, and its direction inputis de-glitched by circuit 3. Controller 2 inputs are de-glitched bycircuits 6 to 4 for hall inputs and 7 for direction input.

[1817] Each controller also requires a PWM input. The stepper motorcontroller outputs are reused, output 0 is connected to BLDC controller1, and output 1 to BLDC controller 2.

[1818] The controllers have two modes of operation, internal andexternal direction control (configured by BLDCMode). If a controller isin external direction mode the direction input is taken from ade-glitched circuit, if it is in internal direction mode the directioninput is configured by the BLDCDirection register.

[1819] The BLDC controller outputs are connected to the GPIO output pinsby configuring the IOModeSelect register for each pin. e.g Setting themode register to 8 will connect q1 Controller 1 to drive the pin.

[1820] 13.11 Implementation

[1821] 13.11.1 Definitions of I/O TABLE 84 I/O definition Port name PinsI/O Description Clocks and Resets Pclk 1 In System Clock prst_n 1 InSystem reset, synchronous active low tim_pulse[2:0] 3 In Timers blockgenerated timing pulses. 0 - 1 μs pulse 1 - 100 μs pulse 2 - 10 ms pulseCPU Interface cpu_adr[8:2] 8 In CPU address bus. Only 7 bits arerequired to decode the address space for this block cpu_dataout[31:0] 32In Shared write data bus from the CPU gpio_cpu_data[31:0] 32 Out Readdata bus to the CPU cpu_rwn 1 In Common read/not-write signal from theCPU cpu_gpio_sel 1 In Block select from the CPU. When cpu_gpio_sel ishigh both cpu_adr and cpu_dataout are valid gpio_cpu_rdy 1 Out Readysignal to the CPU. When gpio_cpu_rdy is high it indicates the last cycleof the access. For a write cycle this means cpu_dataout has beenregistered by the GPIO block and for a read cycle this means the data ongpio_cpu_data is valid. gpio_cpu_berr 1 Out Bus error signal to the CPUindicating an invalid access. gpio_cpu_debug_valid 1 Out Debug Datavalid on gpio_cpu_data bus. Active high cpu_acode[1:0] 2 In CPU AccessCode signals. These decode as follows: 00 - User program access 01 -User data access 10 - Supervisor program access 11 - Supervisor dataaccess IO Pins gpio_o[31:0] 32 Out General purpose IO output to IOdriver gpio_i[31:0] 32 In General purpose IO input from IO receivergpio_e[31:0] 32 Out General purpose IO output control. Active highdriving GPIO to LSS lss_gpio_dout[1:0] 2 In LSS bus data output Bit 0 -LSS bus 0 Bit 1 - LSS bus 1 gpio_lss_din[1:0] 2 Out LSS bus data inputBit 0 - LSS bus 0 Bit 1 - LSS bus 1 lss_gpio_e[1:0] 2 In LSS bus dataoutput enable, active high Bit 0 - LSS bus 0 Bit 1 - LSS bus 1lss_gpio_clk[1:0] 2 In LSS bus clock output Bit 0 - LSS bus 0 Bit 1 -LSS bus 1 GPIO to ISI gpio_isi_din[1:0] 2 Out Input data from IOreceivers to ISI. isi_gpio_dout[1:0] 2 In Data output from ISI to IOdrivers isi_gpio_e[1:0] 2 In GPIO ISI pins output enable (active high)from ISI interface usbh_gpio_power_en 1 In Port Power enable from theUSB host core, active high gpio_usbh_over_current 1 Out Over currentdetect to the USB host core, active high Miscellaneous gpio_icu_irq[9:0]10 Out GPIO pin interrupts gpio_cpr_wakeup 1 Out SoPEC wakeup to the CPRblock active high. Debug debug_data_out[31:0] 32 In Output debug data tobe muxed on to the GPIO pins debug_cntrl[31:0] 32 In Control signal foreach GPIO bound debug data line indicating whether or not the debug datashould be selected by the pin mux

[1822] 13.11.2 Configuration Registers

[1823] The configuration registers in the GPIO are programmed via theCPU interface. Refer to section 11.4.3 on page 69 for a description ofthe protocol and timing diagrams for reading and writing registers inthe GPIO. Note that since addresses in SoPEC are byte aligned and theCPU only supports 32-bit register reads and writes, the lower 2 bits ofthe CPU address bus are not required to decode the address space for theGPIO. When reading a register that is less than 32 bits wide zerosshould be returned on the upper unused bit(s) of gpio_cpu_data. Table 85lists the configuration registers in the GPIO block TABLE 85 GPIORegister Definition Address GPIO_base + Register #bits Reset Description0x000-0x07C IOModeSelect[31:0] 32 × 5 See Specifies the mode ofoperation for each Table for GPIO pin. One 5 bit bus per pin. defaultvalues Possible assignment values and correspond controller outputs areas follows Value - Controlled by 3 to 0 - Output, LED controller 4 to 17 to 4 - Output Stepper Motor control 4-1 13 to 8 - Output BLDC 1 Motorcontrol 6-1 19 to 14 - Output BLDC 2 Motor control 6-1 23 to 20 - LSScontrol 4-1 27 to 24 - ISI control 4-1 28 - CPU Direct Control 29 - USBpower enable output 30 - Input Mode 0x080-0xA4 InputPinSelect[9:0] 10 ×5 0x00 Specifies which pins should be selected as inputs. Used to selectthe pin source to the DeGlitch Circuits. CPU IO Control 0x0B0CpuIOUserModeMask 32 0x0000_0000 User Mode Access Mask to CPU GPIOcontrol register. When 1 user access is enabled. One bit per gpio pin.Enables access to CpuIODirection, CpuIOOut and CpuIOIn in user mode.0x0B4 CpuIOSuperModeMask 32 0xFFFF_FFFF Supervisor Mode Access Mask toCPU GPIO control register. When 1 supervisor access is enabled. One bitper gpio pin. Enables access to CpuIODirection, CpuIOOut and CpuIOIn insupervisor mode. 0x0B8 CpuIODirection 32 0x0000_0000 Indicates thedirection of each IO pin, when controlled by the CPU 0 - Indicates InputMode 1 - Indicates Output Mode 0x0BC CpuIOOut 32 0x0000_0000 Value usedto drive output pin in CPU direct mode. bits31:0 - Value to drive onoutput GPIO pins When written to the register assumes the new valueXORed with the current value. 0x0C0 CpuIOIn 32 External Value receivedon each input pin regardless pin value of mode. Read Only register.0x0C4 CpuDeGlitchUserModeMask 10 0x000 User Mode Access Mask toCpuIOInDeglitch control register. When 1 user access is enabled,otherwise bit reads as zero. 0x0C8 CpuIOInDeglitch 10 0x000 Deglitchedversion of selected input pins. The input pins are selected by theInputPinSelect register. Note that after reset this register willreflect the external pin values 256 pclk cycles after they havestabilized. Read Only register. Deglitch control 0x0D0-0x0D4DeGlitchCount[1:0] 2 × 8 0xFF Deglitch circuit sample count inDeGlitchClkSrc selected units. 0x0D8-0x0DC DeGlitchClkSrc[1:0] 2 × 2 0x3Specifies the unit use of the GPIO deglitch circuits: 0 - 1 μs pulse 1 -100 μs pulse 2 - 10 ms pulse 3 - pclk 0x0E0 DeGlitchSelect 10 0x000Specifies which deglitch count (DeGlitchCount) and unit select(DeGlitchClkSrc) should be used with each de-glitch circuit 0 -Specifies DeGlitchCount[0] and DeGlitchClkSrc[0] 1 - SpecifiesDeGlitchCount[1] and DeGlitchClkSrc[1] Motor Control 0x0E4MotorCtrlUserModeEnable 1 0x0 User Mode Access enable to Motor controlconfiguration registers. When 1 user access is enabled. Enables useraccess to MotorMasterClkPeriod, MotorMasterClkSrc, MotorDutySelect,MotorPhaseSelect, MotorMasterClockEnable, MotorMasterClkSelect, BLDCModeand BLDCDirection registers 0x0E8-0x0EC MotorMasterClkPeriod[1:0] 2 × 160x0000 Specifies the motor controller master clock periods inMotorMasterClkSrc selected units 0x0F0 MotorMasterClkSrc 2 0x0 Specifiesthe unit use by the motor controller master clock generator: 0 - 1 μspulse 1 - 100 μs pulse 2 - 10 ms pulse 3 - pclk 0x0F4-0x100MotorCtrlConfig[3:0] 4 × 32 0x0000_0000 Specifies the transition pointsin the clock period for each motor control pin. One register per pinbits 15:0 - MotorCtrlLow, high to low transition point bits 31:16 -MotorCtrlHigh, low to high transition point 0x104 MotorMasterClkSelect 40x0 Specifies which motor master clock should be used as a pin generatorsource 0 - Clock derived from MotorMasterClockPeriod[0] 1 -Clock derivedfrom MotorMasterClockPeriod[1] 0x108 MotorMasterClockEnable 2 0x0 Enablethe motor master clock counter. When 1 count is enabled Bit 0 - Enablemotor master clock 0 Bit 1 - Enable motor master clock 1 BLDC MotorControllers 0x10C BLDCMode 2 0x0 Specifies the Mode of operation of theBLDC Controller. One bit per Controller. 0 - External direction control1 - Internal direction control 0x110 BLDCDirection 2 0x0 Specifies thedirection input of the BLDC controller. Only used when BLDC controlleris an internal direction control mode. One bit per controller. LEDcontrol 0x114 LEDCtrlUserModeEnable 4 0x0 User Mode Access enable to LEDcontrol configuration registers. When 1 user access is enabled. One bitper LEDDutySelect select register. 0x118-0x124 LEDDutySelect[3:0] 4 × 30x0 Specifies the duty cycle for each LED control output. See FIG. 54for encoding details. The LEDDutySelect[3:0] registers determine theduty cycle of the LED controller outputs Frequency Analyser 0x130FreqAnaUserModeEnable 1 0x0 User Mode Access enable to Frequencyanalyser configuration registers. When 1 user access is enabled.Controls access to FreqAnaPinFormSelect, FreqAnaLastPeriod,FreqAnaAverage and FreqAnaCountInc. 0x134 FreqAnaPinSelect 4 0x00Selects which selected input should be used for the frequency analyses.0x138 FreqAnaPinFormSelect 1 0x0 Selects if the frequency analysershould use the raw input or the deglitched form. 0 - Deglitched form ofinput pin 1 - Raw form of input pin 0x13C FreqAnaLastPeriod 16 0x0000Frequency Analyser last period of selected input pin. 0x140FreqAnaAverage 16 0x0000 Frequency Analyser average period of selectedinput pin. 0x144 FreqAnaCountInc 20 0x0000 0 Frequency Analyser counterincrement amount. For each clock cycle no edge is detected on theselected input pin the accumulator is incremented by this amount. 0x148FreqAnaCount 32 0x0000_0000 Frequency Analyser running counter (Workingregister) Miscellaneous 0x150 InterruptSrcSelect 10 0x3FF Interruptsource select. 1 bit per selected input. Determines whether theinterrupt source is direct form the selected input pin or the deglitchedversion. Input pins are selected by the DeGlitchPinSelect register. 0 -Selected input direct 1 - Deglitched selected input 0x154DebugSelect[8:2] 7 0x00 Debug address select. Indicates the address ofthe register to report on the gpio_cpu_data bus when it is not otherwisebeing used. 0x158-0x15C MotorMasterCount[1:0] 2 × 16 0x0000 Motor masterclock counter values. Bus 0 - Master clock count 0 Bus 1 - Master clockcount 1 Read Only registers 0x160 WakeUpInputMask 10 0x000 Indicateswhich deglitched inputs should be considered to generate the CPR wakeup.Active high 0x164 WakeUpLevel 1 0 Defines the level to detect on themasked GPIO inputs to generate a wakeup to the CPR 0 - Level 0 1 - Level1 0x168 USBOverCurrentPinSelect 4 0x00 Selects which deglitched inputshould be used for the USB over current detect.

[1824] 13.11.2.1 Supervisor and User Mode Access

[1825] The configuration registers block examines the CPU access type(cpu_acode signal) and determines if the access is allowed to thatparticular register, based on configured user access registers. If anaccess is not allowed the GPIO will issue a bus error by asserting thegpio_cpu_berr signal.

[1826] All supervisor and user program mode accesses will result in abus error.

[1827] Access to the CpuIODirection, CpuIOOut and CpuIOIn is filtered bythe CpuIOUserModeMask and CpuIOSuperModeMask registers. Each bit masksaccess to the corresponding bits in the CpuIO* registers for each mode,with CpuIOUserModeMask filtering user data mode access andCpuIOSuperModeMask filtering supervisor data mode access.

[1828] The addition of the CpuIOSuperModeMask register helps preventpotential conflicts between user and supervisor code read modify writeoperations. For example a conflict could exist if the user code isinterrupted during a read modify write operation by a supervisor ISRwhich also modifies the CpuIO* registers.

[1829] An attempt to write to a disabled bit in user or supervisor modewill be ignored, and an attempt to read a disabled bit returns zero. Ifthere are no user mode enabled bits then access is not allowed in usermode and a bus error will result. Similarly for supervisor mode.

[1830] When writing to the CpuIOOut register, the value being written isXORed with the current value in the CpuIOOut register, and the result isreflected on the GPIO pins.

[1831] The pseudocode for determining access to the CpuIOOut register isshown below. Similar code could be shown for the CpuIODirection andCpuIOIn registers. Note that when writing to CpuIODirection data isdeposited directly and not XORed with the existing data (as in theCpuIOOut case). if (cpu_acode = = SUPERVISOR_DATA_MODE) then //supervisor mode if (CpuIOSuperModeMask[31:0] = = 0 ) then // access isdenied, and bus error gpio_cpu_berr = 1 elsif (cpu_rwn = = 1) then //read mode (no filtering needed) gpio_cpu_data[31:0] = CpuIOOut[31:0]else // write mode, filtered by mask mask[31:0] = (cpu_dataout[31:0] &CpuIOSuperModeMask[31:0]) CpuIOOut[31:0] = (cpu_dataout[31:0]{circumflex over ( )} mask[31:0] ) //bitwise XOR operator elsif(cpu_acode = = USER_DATA_MODE) then // user datamode if(CpuIOUserModeMask[31:0] = = 0 ) then // access is denied, and bus errorgpio_cpu_berr = 1 elsif (cpu_rwn = = 1) then // read mode, filtered bymask gpio_cpu_data = ( CpuIOOut[31:0] & CpuIOUserModeMask[31:0]) else //write mode, filtered by mask mask[31:0] = (cpu_dataout[31:0] &CpuIOUserModeMask[31:0]) CpuIOOut[31:0] = (cpu_dataout[31:0] {circumflexover ( )} mask[31:0] ) //bitwise XOR operator else // access is denied,bus error gpio_cpu_berr = 1

[1832] Table 86 details the access modes allowed for registers in theGPIO block. In supervisor mode all registers are accessible. In usermode forbidden accesses will result in a bus error (gpio_cpu_berrasserted). TABLE 86 GPIO supervisor and user access modes RegisterAddress Registers Access Permitted 0x000-0x07C IOModeSelect[31:0]Supervisor data mode only 0x080-0x94 InputPinSelect[9:0] Supervisor datamode only CPU IO Control 0x0B0 CpuIOUserModeMask Supervisor data modeonly 0x0B4 CpuIOSuperModeMask Supervisor data mode only 0x0B8CpuIODirection CpuIOUserModeMask and CpuIOSuperModeMask filtered 0x0BCCpuIOOut CpuIOUserModeMask and CpuIOSuperModeMask filtered 0x0C0 CpuIOInCpuIOUserModeMask and CpuIOSuperModeMask filtered 0x0C4CpuDeGlitchUserModeMask Supervisor data mode only 0x0C8 CpuIOInDeglitchCpuDeGlitchUserModeMask filtered. Unrestricted Supervisor data modeaccess Deglitch control 0x0D0-0x0D4 DeGlitchCount[1:0] Supervisor datamode only 0x0D8-0x0DC DeGlitchClkSrc[1:0] Supervisor data mode only0x0E0 DeGlitchSelect Supervisor data mode only Motor Control 0x0E4MotorCtrlUserModeEnable Supervisor data mode only 0x0E8-0x0ECMotorMasterClkPeriod[1:0] MotorCtrlUserModeEnable enabled. 0x0F0MotorMasterClkSrc MotorCtrlUserModeEnable enabled. 0x0F4-0x100MotorCtrlConfig[3:0] MotorCtrlUserModeEnable enabled 0x104MotorMasterClkSelect MotorCtrlUserModeEnable enabled 0x108MotorMasterClockEnable MotorCtrlUserModeEnable enabled BLDC MotorControllers 0x10C BLDCMode MotorCtrlUserModeEnable Enabled 0x110BLDCDirection MotorCtrlUserModeEnable Enabled LED control 0x114LEDCtrlUserModeEnable Supervisor data mode only 0x118-0x124LEDDutySelect[3:0] LEDCtrlUserModeEnable[3:0] enabled Frequency Analyser0x130 FreqAnaUserModeEnable Supervisor data mode only 0x134FreqAnaPinSelect FreqAnaUserModeEnable enabled 0x138FreqAnaPinFormSelect FreqAnaUserModeEnable enabled 0x13CFreqAnaLastPeriod FreqAnaUserModeEnable enabled 0x140 FreqAnaAverageFreqAnaUserModeEnable enabled 0x144 FreqAnaCountIncFreqAnaUserModeEnable enabled 0x148 FreqAnaCount FreqAnaUserModeEnableenabled Miscellaneous 0x150 InterruptSrcSelect Supervisor data mode only0x154 DebugSelect[8:2] Supervisor data mode only 0x158-0x15CMotorMasterCount[1:0] Supervisor data mode only 0x160 WakeUpInputMaskSupervisor data mode only 0x164 WakeUpLevel Supervisor data mode only0x168 USBOverCurrentPinSelect Supervisor data mode only

[1833] 13.11.3 GPIO Partition

[1834] 13.11.4 IO Control

[1835] The IO control block connects the IO pin drivers to internalsignalling based on configured setup registers and debug controlsignals. // Output Control for (i=0; i<32 ; i++) { if (debug_cntrl[i]= = 1) then // debug mode gpio_e[i] = 1;gpio_o[i] =debug_data_out[i]else // normal mode case io_mode_select[i] is 0 : gpio_e[i] =1;gpio_o[i] =led_ctrl[0]  // LED output 1 1 : gpio_e[i] =1 ;gpio_o[i]=led_ctrl[1]  // LED output 2 2 : gpio_e[i] =1 ;gpio_o[i] =led_ctrl[2] // LED output 3 3 : gpio_e[i] =1 ;gpio_o[i] =led_ctrl[3]  // LED output4 4 : gpio_e[i] =1 ;gpio_o[i] =motor_ctrl[0] // Stepper Motor Control 15 : gpio_e[i] =1 ;gpio_o[i] =motor_ctrl[1] // Stepper Motor Control 2 6: gpio_e[i] =1 ;gpio_o[i] =motor_ctrl[2] // Stepper Motor Control 3 7 :gpio_e[i] =1 ;gpio_o[i] =motor_ctrl[3] // Stepper Motor Control 4 8 :gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[0][0]  // BLDC Motor Control 1,output 1 9 : gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[0][1]  // BLDC MotorControl 1, output 2 10: gpio_e[i] =1_(;)gpio_o[i] =bldc_ctrl[0][2]  //BLDC Motor Control 1, output 3 11: gpio_e[i] =1 ;gpio_o[i]=bldc_ctrl[0][3]  // BLDC Motor Control 1, output 4 12: gpio_e[i] =1;gpio_o[i] =bldc_ctrl[0][4]  // BLDC Motor Control 1, output 5 13:gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[0][5]  // BLDC Motor Control 1,output 6 14: gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[1][0]  // BLDC MotorControl 2, output 1 15: gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[1][1]  //BLDC Motor Control 2, output 2 16: gpio_e[i] =1 ;gpio_o[i]=bldc_ctrl[1][2]  // BLDC Motor Control 2, output 3 17: gpio_e[i] =1;gpio_o[i] =bldc_ctrl[1][3]  // BLDC Motor Control 2, output 4 18:gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[1][4]  // BLDC Motor Control 2,output 5 19: gpio_e[i] =1 ;gpio_o[i] =bldc_ctrl[1][5]  // BLDC MotorControl 2, output 6 20: gpio_e[i] =1 ;gpio_o[i] =lss_gpio_clk[0] // LSSClk 0 21: gpio_e[i] =1 ;gpio_o[i] =lss_gpio_clk[1] // LSS Clk 1 22:gpio_e[i] =lss_gpio_e[0] ;gpio_o[i] =lss_gpio_dout[0]; // LSS Data 0gpio_lss_din[0] = gpio_i[i] 23: gpio_e[i] =lss_gpio_e[1] ;gpio_o[i]=lss_gpio_dout[1]; // LSS Data 1 gpio_lss_din[1] = gpio_i[i] 24:gpio_e[i] =isi_gpio_e[0] ;gpio_o[i] =isi_gpio_dout[0]; // ISI Control 1gpio_isi_din[0] = gpio_i[i] 25: gpio_e[i] =isi_gpio_e[1] ;gpio_o[i]=isi_gpio_dout[1]; // ISI Control 2 gpio_isi_din[1] = gpio_i[i] 26:gpio_e[i] =isi_gpio_e[2] ;gpio_o[i] =isi_gpio_dout[2]; // ISI Control 3gpio_isi_din[2] = gpio_i[i] 27: gpio_e[i] =isi_gpio_e[3] ;gpio_o[i]=isi_gpio_dout[3]; // ISI Control 4 gpio_isi_din[3] = gpio_i[i] 28:gpio_e[i] =cpu_io_dir[i] ;gpio_o[i] =cpu_io_out[i]; // CPU Direct 29:gpio e[i]  =1 ;gpio o[i]  =usbh gpio power en // USB host power enable30: gpio e[i] =0 ;gpio o[i] =0 // Input only mode end case // all gpioare always readable by the CPU cpu_io_in[i] = gpio_i[i]; }

[1836] The input selection pseudocode, for determining which pinconnects to which de-glitch circuit. for( i=0 ;i < 10 ; i++) { pin_num =input_pin_select[i] deglitch_input[i] = gpio_i[pin_num] }

[1837] The gpio_usbh_over_current output to the USB core is driven by aselected deglitched input (configured by the USBOverCurrentPinSelectregister).

[1838] index=USBOverCurrentPinSelect

[1839] gpio_usbh_over_current=cpu_io_in_deglitch[index]

[1840] 13.11.5 Wakeup Generator

[1841] The wakeup generator compares the deglitched inputs with theconfigured mask (WakeUpInputMask) and level (WakeUpLevel), anddetermines whether to generate a wakeup to the CPR block. for (i=0;i<10; i++) { if (wakeup_level = 0) then // level 0 activewakeup = wakeup OR wakeup_input_mask[i] AND NOT cpu_io_in_deglitch[i]else // level 1 active wakeup = wakeup OR wakeup_input_mask[i] ANDcpu_io_in_deglitch[i] } // assign the output gpio_cpr_wakeup = wakeup

[1842] 13.11.6 LED Pulse Generator

[1843] The pulse generator logic consists of a 7-bit counter that isincremented on a 1 μs pulse from the timers block (tim_pulse[0]). TheLED control signal is generated by comparing the count value with theconfigured duty cycle for the LED (led_duty_sel).

[1844] The logic is given by: for (i=0 i<4 ;i++) { // for each LED pin// period divided into 8 segments period_div8 = cnt[6:4]; if(period_div8 < led_duty_sel[i] ) then led_ctrl[i] = 1 else led_ctrl[i] =0 } // update the counter every 1us pulse if (tim_pulse[0] = = 1) thencnt ++

[1845] 13.11.7 Stepper Motor Control

[1846] The motor controller consists of 2 counters, and 4 phasegenerator logic blocks, one per motor control pin. The countersdecrement each time a timing pulse (cnt_en) is received. The countersstart at the configured clock period value (motor_mas_clk_period) anddecrement to zero. If the counters are enabled (viamotor_mas_clk_enable), the counters will automatically restart at theconfigured clock period value, otherwise they will wait until thecounters are re-enabled.

[1847] The timing pulse period is one of pclk, 1 μs, 100 μs, 1 msdepending on the motor_mas_clk_sel signal. The counters are used toderive the phase and duty cycle of each motor control pin. // decrementlogic if (cnt_en = = 1) then if ((mas_cnt = = 0) AND(motor_mas_clk_enable = = 1)) then mas_cnt = motor_mas_clk_period[15:0]elsif ((mas_cnt = = 0) AND (motor_mas_clk_enable = = 0)) then mas_cnt =0 else mas_cnt − − else // hold the value mas_cnt = mas_cnt

[1848] The phase generator block generates the motor control logic basedon the selected clock generator (motor_mas_clk_sel) the motor controlhigh transition point (curr_motor_ctrl_high) and the motor control lowtransition point (curr_motor_ctrl_low).

[1849] The phase generator maintains current copies of themotor_ctrl_config configuration value (motor_ctrl_config[31:16] becomescurr_motor_ctrl_high and motor_ctrl_config[15:0] becomescurr_motor_ctrl_low). It updates these values to the current registervalues when it is safe to do so without causing a glitch on the outputmotor pin.

[1850] Note that when reprogramming the motor_ctrl_config register toreorder the sequence of the transition points (e.g changing from lowpoint less than high point to low point greater than high point and viceversa) care must taken to avoid introducing glitching on the output pin.

[1851] There are 4 instances one per motor control pin.

[1852] The logic is given by: // select the input counter to use if(motor_mas_clk_sel = = 1) then count = mas_cnt[1] else count =mas_cnt[0] // Generate the phase and duty cycle if (count = =curr_motor_ctrl_low) then motor_ctrl = 0 elsif (count = =curr_motor_ctrl_high) then motor_ctrl = 1 else motor_ctrl = motor_ctrl// remain the same // update the current registers at period boundary if(count = = 0) then curr_motor_ctrl_high = motor_ctrl_config[31:16] //update to new high value curr_motor_ctrl_low = motor_ctrl_config[15:0]// update to new high value

[1853] 13.11.8 Input Deglitch

[1854] The input deglitch logic rejects input states of duration lessthan the configured number of time units (deglitch_cnt), input states ofgreater duration are reflected on the output cpu_io_in_deglitch. Thetime units used (either pclk, 1 μs, 100 μs, 1 ms) by the deglitchcircuit is selected by the deglitch_clk_src bus.

[1855] There are 2 possible sets of deglitch_cnt and deglitch_clk_srcthat can be used to deglitch the input pins. The values used areselected by the deglitch_sel signal.

[1856] There are 10 deglitch circuits in the GPIO. Any GPIO pin can beconnected to a deglitch circuit. Pins are selected for deglitching bythe InputPinSelect registers.

[1857] Each selected input can be used to generate an interrupt. Theinterrupt can be generated from the raw input signal (deglitch_input) ora deglitched version of the input (cpu_io_in_deglitch). The interruptsource is selected by the interrupt_src_select signal.

[1858] The counter logic is given by if (deglitch_input !=deglitch_input_delay) then cnt = deglitch_cnt output_en = 0 elsif (cnt= = 0 ) then cnt = cnt output_en = 1 elsif (cnt_en = = 1) then cnt − −output_en = 0

[1859] 13.11.9 Frequency Analyser

[1860] The frequency analyser block monitors a selected deglitched input(cpu_io_in_deglitch) or a direct selected input (deglitch_input) anddetects positive edges. The selected input is configured byFreqAnaPinSelect and FreqAnaPinFormSel registers. Between successivepositive edges detected on the input it increments a counter(FreqAnaCount) by a programmed amount (FreqAnaCountInc) on each clockcycle. When a positive edge is detected the FreqAnaLastPeriod registeris updated with the top 16 bits of the counter and the counter is reset.The frequency analyser also maintains a running average of theFreqAnaLastPeriod register. Each time a positive edge is detected on theinput the FreqAnaAverage register is updated with the new calculatedFreqAnaLastPeriod. The average is calculated as ⅞ the current value plus⅛ of the new value. The FreqAnaLastPeriod, FreqAnaCount andFreqAnaAverage registers can be written to by the CPU.

[1861] The pseudocode is given by if ((pin = = 1) AND pin_delay = =0))then // positive edge detected freq_ana_lastperiod[15:0] =freq_ana_count[31:16] freq_ana_average[15:0] = freq_ana_average[15:0] −freq_ana_average [15:3] + freq_ana_lastperiod[15:3] freq_ana_count[15:0]= 0 else freq_ana_count[31:0] = freq_ana_count[31:0] +freq_ana_count_inc[19:0] // implement the configuration register writeif (wr_last_en = = 1) then freq_ana_lastperiod = wr_data elsif(wr_average_en = = 1 ) then freq_ana_average = wr_data elsif(wr_freq_count_en = = 1) then freq_ana_count = wr_data

[1862] 13.11.10 BLDC Motor Controller

[1863] The BLDC controller logic is identical for both instances, onlythe input connections are different. The logic implements the truthtable shown in Table. The six q outputs are combinationally based on thedirection, ha, hb, hc and pwm inputs. The direction input has 2 possiblesources selected by the mode, the pseudocode is as follows // determineif in internal or external direction mode if (mode = = 1) then //internal mode direction = int_direction else // external mode direction= ext_direction

[1864] 14 Interrupt Controller Unit (ICU)

[1865] The interrupt controller accepts up to N input interrupt sources,determines their priority, arbitrates based on the highest priority andgenerates an interrupt request to the CPU. The ICU complies with theinterrupt acknowledge protocol of the CPU. Once the CPU accepts aninterrupt (i.e. processing of its service routine begins) the interruptcontroller will assert the next arbitrated interrupt if one is pending.

[1866] Each interrupt source has a fixed vector number N, and anassociated configuration register, IntReg[N]. The format of theIntReg[N] register is shown in Table 87 below. TABLE 87 IntReg[N]register format Field bit(s) Description Priority 3:0 Interrupt priorityType 5:4 Determines the triggering conditions for the interrupt 00 -Positive edge 10 - Negative edge 01 - Positive level 11 - Negative levelMask 6 Mask bit. 1 - Interrupts from this source are enabled, 0 -Interrupts from this source are disabled. Note that there may beadditional masks in operation at the source of the interrupt. Reserved31:7 Reserved. Write as 0.

[1867] Once an interrupt is received the interrupt controller determinesthe priority and maps the programmed priority to the appropriate CPUpriority levels, and then issues an interrupt to the CPU. The programmedinterrupt priority maps directly to the LEON CPU interrupt levels. Level0 is no interrupt. Level 15 is the highest interrupt level.

[1868] 14.1 Interrupt Preemption

[1869] With standard LEON pre-emption an interrupt can only bepre-empted by an interrupt with a higher priority level. If an interruptwith the same priority level (1 to 14) as the interrupt being servicedbecomes pending then it is not acknowledged until the current serviceroutine has completed. Note that the level 15 interrupt is a specialcase, in that the LEON processor will continue to take level 15interrupts (i.e re-enter the ISR) as long as level 15 is asserted on theicu_cpu_ilevel. Level 0 is also a special case, in that LEON considerlevel 0 interrupts as no interrupt, and will not issue an acknowledgewhen level 0 is presented on the icu_cpu_ilevel bus.

[1870] Thus when pre-emption is required, interrupts should beprogrammed to different levels as interrupt priorities of the same levelhave no guaranteed servicing order. Should several interrupt sources beprogrammed with the same priority level, the lowest value interruptsource will be serviced first and so on in increasing order.

[1871] The interrupt is directly acknowledged by the CPU and the ICUautomatically clears the pending bit of the lowest value pendinginterrupt source mapped to the acknowledged interrupt level.

[1872] All interrupt controller registers are only accessible insupervisor data mode. If the user code wishes to mask an interrupt itmust request this from the supervisor and the supervisor software willresolve user access levels.

[1873] 14.2 Interrupt Sources

[1874] The mapping of interrupt sources to interrupt vectors (andtherefore IntReg[N] registers) is shown in Table 88 below. Please referto the appropriate section of this specification for more details of theinterrupt sources. TABLE 88 Interrupt sources vector table Vector SourceDescription 0 Timers WatchDog Timer Update request 1 Timers GenericTimer 1 interrupt 2 Timers Generic Timer 2 interrupt 3 PCU PEPSub-system Interrupt- TE finished band 4 PCU PEP Sub-system Interrupt-LBD finished band 5 PCU PEP Sub-system Interrupt- CDU finished band 6PCU PEP Sub-system Interrupt- CDU error 7 PCU PEP Sub-system Interrupt-PCU finished band 8 PCU PEP Sub-system Interrupt- PCU Invalid addressinterrupt 9 PHI PEP Sub-system Interrupt- PHI Line Sync Interrupt 10 PHIPEP Sub-system Interrupt- PHI Buffer underrun 11 PHI PEP Sub-systemInterrupt- PHI Page finished 12 PHI PEP Sub-system Interrupt- PHI Printready 13 SCB USB Host interrupt 14 SCB USB Device interrupt 15 SCB ISIinterrupt 16 SCB DMA interrupt 17 LSS LSS interrupt, LSS interface 0interrupt request 18 LSS LSS interrupt, LSS interface 1 interruptrequest 19-28 GPIO GPIO general purpose interrupts 29 Timers GenericTimer 3 interrupt

[1875] 14.3 Implentation

[1876] 14.3.1 Definitions of I/O TABLE 89 Interrupt Controller Unit I/Odefinition Port name Pins I/O Description Clocks and Resets Pclk 1 InSystem Clock prst_n 1 In System reset, synchronous active low CPUinterface cpu_adr[7:2] 6 In CPU address bus. Only 6 bits are required todecode the address space for the ICU block cpu_dataout[31:0] 32 InShared write data bus from the CPU icu_cpu_data[31:0] 32 Out Read databus to the CPU cpu_rwn 1 In Common read/not-write signal from the CPUcpu_icu_sel 1 In Block select from the CPU. When cpu_icu_sel is highboth cpu_adr and cpu_dataout are valid icu_cpu_rdy 1 Out Ready signal tothe CPU. When icu_cpu_rdy is high it indicates the last cycle of theaccess. For a write cycle this means cpu_dataout has been registered bythe ICU block and for a read cycle this means the data on icu_cpu_datais valid. icu_cpu_ilevel[3:0] 4 Out Indicates the priority level of thecurrent active interrupt. cpu_iack 1 In Interrupt request acknowledgefrom the LEON core. cpu_icu_ilevel[3:0] 4 In Interrupt acknowledgedlevel from the LEON core icu_cpu_berr 1 Out Bus error signal to the CPUindicating an invalid access. cpu_acode[1:0] 2 In CPU Access Codesignals. These decode as follows: 00 - User program access 01 - Userdata access 10 - Supervisor program access 11 - Supervisor data accessicu_cpu_debug_valid 1 Out Debug Data valid on icu_cpu_data bus. Activehigh Interrupts tim_icu_wd_irq 1 In Watchdog timer interrupt signal fromthe Timers block tim_icu_irq[2:0] 3 In Generic timer interrupt signalsfrom the Timers block gpio_icu_irq[9:0] 10 In GPIO pin interruptsusb_icu_irq[1:0] 2 In USB host and device interrupts from the SCB Bit0 - USB Host interrupt Bit 1 - USB Device interrupt isi_icu_irq 1 In ISIinterrupt from the SCB dma_icu_irq 1 In DMA interrupt from the SCBlss_icu_irq[1:0] 2 In LSS interface interrupt request cdu_finishedband 1In Finished band interrupt request from the CDU cdu_icu_jpegerror 1 InJPEG error interrupt from the CDU lbd_finishedband 1 In Finished bandinterrupt request from the LBD te_finishedband 1 In Finished bandinterrupt request from the TE pcu_finishedband 1 In Finished bandinterrupt request from the PCU pcu_icu_address_invalid 1 In Invalidaddress interrupt request from the PCU phi_icu_underrun 1 In Bufferunderrun interrupt request from the PHI phi_icu_page_finish 1 In Pagefinished interrupt request from the PHI phi_icu_print_rdy 1 In Printready interrupt request from the PHI phi_icu_linesync_int 1 In Line syncinterrupt request from the PHI

[1877] 14.3.2 Configuration Registers

[1878] The configuration registers in the ICU are programmed via the CPUinterface. Refer to section 11.4 on page 69 for a description of theprotocol and timing diagrams for reading and writing registers in theICU. Note that since addresses in SoPEC are byte aligned and the CPUonly supports 32-bit register reads and writes, the lower 2 bits of theCPU address bus are not required to decode the address space for theICU. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of icu_pcu_data. Table 90 liststhe configuration registers in the ICU block.

[1879] The ICU block will only allow supervisor data mode accesses (i.e.cpu_acode[1:0]=SUPERVISOR_DATA). All other accesses will result inicu_cpu_berr being asserted. TABLE 90 ICU Register Map AddressICU_base + Register #bits Reset Description 0x00-0x74 IntReg[29:0] 30 ×7 0x00 Interrupt vector configuration register 0x88 IntClear 300x0000_0000 Interrupt pending clear register. If written with a one itclears corresponding interrupt Bits[30:0] - Interrupts sources 30 to 0(Reads as zero) 0x90 IntPending 30 0x0000_0000 Interrupt pendingregister. (Read Only) Bits[30:0]- Interrupts sources 30 to 0 0xA0IntSource 5 0x1F Indicates the interrupt source of the last acknowl-edged interrupt. The NoInterrupt value is defined as all bits set toone. (Read Only) 0xC0 DebugSelect[7:2] 6 0x00 Debug address select.Indicates the address of the register to report on the icu_cpu_data buswhen it is not otherwise being used.

[1880] 14.3.3 ICU Partition

[1881] 114.3.4 Interrupt Detect

[1882] The ICU contains multiple instances of the interrupt detectblock, one per interrupt source. The interrupt detect block examines theinterrupt source signal, and determines whether it should generaterequest pending (int_pend) based on the configured interrupt type andthe interrupt source conditions. If the interrupt is not masked theinterrupt will be reflected to the interrupt arbiter via the int_activesignal. Once an interrupt is pending it remains pending until theinterrupt is accepted by the CPU or it is level sensitive and getsremoved. Masking a pending interrupt has the effect of removing theinterrupt from arbitration but the interrupt will still remain pending.

[1883] When the CPU accepts the interrupt (using the normal ISRmechanism), the interrupt controller automatically generates aninterrupt clear for that interrupt source (cpu_int_clear). Alternativelyif the interrupt is masked, the CPU can determine pending interrupts bypolling the IntPending registers. Any active pending interrupts can becleared by the CPU without using an ISR via the IntClear registers.

[1884] Should an interrupt clear signal (either from the interrupt clearunit or the CPU) and a new interrupt condition happen at the same time,the interrupt will remain pending. In the particular case of a levelsensitive interrupt, if the level remains the interrupt will stay activeregardless of the clear signal.

[1885] The logic is shown below: mask = int_config[6] type =int_config[5:4] int_pend = last_int_pend // the last pending interrupt// update the pending FF // test for interrupt condition if (type = =NEG_LEVEL) then int_pend = NOT (int_src) elsif (type = = POS_LEVEL)int_pend = int_src elsif ((type = = POS_EDGE ) AND (int_src = = 1) AND(last_int_src = = 0)) int_pend = 1 elsif ((type = = NEG_EDGE ) AND(int_src = = 0) AND (last_int_src = = 1) ) int_pend = 1 elsif((int_clear = = 1 ) OR (cpu_int_clear= =1) ) then int_pend = 0 elseint_pend = last_int_pend // stay the same as before // mask the pendingbit if (mask = = 1) then int_active = int_pend else int_active = 0 //assign the registers last_int_src = int_src last_int_pend = int_pend

[1886] 14.3.5 Interrupt Arbiter

[1887] The interrupt arbiter logic arbitrates a winning interruptrequest from multiple pending requests based on configured priority. Itgenerates the interrupt to the CPU by setting icu_cpu_ilevel to anon-zero value. The priority of the interrupt is reflected in the valueassigned to icu_cpu_ilevel, the higher the value the higher thepriority, 15 being the highest, and 0 considered no interrupt. //arbitrate with the current winner int_ilevel = 0 for (i=0;i<30;i++) { if( int_active[i] = = 1) then { if (int_config[i][3:0] >win_int_ilevel[3:0] ) then win_int_ilevel[3:0] = int_config[i][3:0] } }} // assign the CPU interrupt level int_ilevel = win_int_ilevel[3:0]

[1888] 14.3.6 Interrupt Clear Unit

[1889] The interrupt clear unit is responsible for accepting aninterrupt acknowledge from the CPU, determining which interrupt sourcegenerated the interrupt, clearing the pending bit for that source andupdating the IntSource register.

[1890] When an interrupt acknowledge is received from the CPU, theinterrupt clear unit searches through each interrupt source looking forinterrupt sources that match the acknowledged interrupt level(cpu_icu_ilevel) and determines the winning interrupt (lower interruptsource numbers have higher priority). When found the interrupt sourcepending bit is cleared and the IntSource register is updated with theinterrupt source number.

[1891] The LEON interrupt acknowledge mechanism automatically disablesall other interrupts temporarily until it has correctly saved state andjumped to the ISR routine. It is the responsibility of the ISR tore-enable the interrupts. To prevent the IntSource register indicatingthe incorrect source for an interrupt level, the ISR must read and storethe IntSource value before re-enabling the interrupts via the EnableTraps (ET) field in the Processor State Register (PSR) of the LEON.

[1892] See section 11.9 on page 104 for a complete description of theinterrupt handling procedure. After reset the state machine remains inIdle state until an interrupt acknowledge is received from the CPU(indicated by cpu_iack). When the acknowledge is received the statemachine transitions to the Compare state, resetting the source counter(cnt) to the number of interrupt sources.

[1893] While in the Compare state the state machine cycles through eachpossible interrupt source in decrementing order. For each activeinterrupt source the programmed priority (int_priority[cnt][3:0]) iscompared with the acknowledged interrupt level from the CPU(cpu_icu_ilevel), if they match then the interrupt is considered the newwinner. This implies the last interrupt source checked has the highestpriority, e.g interrupt source zero has the highest priority and thefirst source checked has the lowest priority. After all interruptsources are checked the state machine transitions to the IntClear state,and updates the int_source register on the transition.

[1894] Should there be no active interrupts for the acknowledged level(e.g. a level sensitive interrupt was removed), the IntSource registerwill be set to NoInterrupt. NoInterrupt is defined as the highestpossible value that IntSource can be set to (in this case 0x1F), and thestate machine will return to Idle.

[1895] The exact number of compares performed per clock cycle isdependent the number of interrupts, and logic area to logic speedtrade-off, and is left to the implementer to determine. A comparison ofall interrupt sources must complete within 8 clock cycles (determined bythe CPU acknowledge hardware).

[1896] When in the IntClear state the state machine has determined theinterrupt source to clear (indicated by the int_source register). Itresets the pending bit for that interrupt source, transitions back tothe Idle state and waits for the next acknowledge from the CPU.

[1897] The minimum time between successive interrupt acknowledges fromthe CPU is 8 cycles.

[1898] 15 Timers Block (TIM)

[1899] The Timers block contains general purpose timers, a watchdogtimer and timing pulse generator for use in other sections of SoPEC.

[1900] 15.1 Watchdog Timer

[1901] The watchdog timer is a 32 bit counter value which counts downeach time a timing pulse is received. The period of the timing pulse isselected by the WatchDogUnitSel register. The value at any time can beread from the WatchDogTimer register and the counter can be reset bywriting a non-zero value to the register. When the counter transitionsfrom 1 to 0, a system wide reset will be triggered as if the reset camefrom a hardware pin.

[1902] The watchdog timer can be polled by the CPU and reset each timeit gets close to 1, or alternatively a threshold (WatchDogIntThres) canbe set to trigger an interrupt for the watchdog timer to be serviced bythe CPU. If the WatchDogIntThres is set to N, then the interrupt will betriggered on the N to N−1 transition of the WatchDogTimer. Thisinterrupt can be effectively masked by setting the threshold to zero.The watchdog timer can be disabled, without causing a reset, by writingzero to the WatchDogTimer register.

[1903] 15.2 Timing Pulse Generator

[1904] The timing block contains a timing pulse generator clocked by thesystem clock, used to generate timing pulses of programmable periods.The period is programmed by accessing the TimerStartValue registers.Each pulse is of one system clock duration and is active high, with thepulse period accurate to the system clock frequency. The periods afterreset are set to 1 us, 100 us and 100 ms.

[1905] The timing pulse generator also contains a 64-bit free runningcounter that can be read or reset by accessing the FreeRunCountregisters. The free running counter can be used to determine elapsedtime between events at system clock accuracy or could be used as aninput source in low-security random number generator.

[1906] 15.3 Generic Timers

[1907] SoPEC contains 3 programmable generic timing counters, for use bythe CPU to time the system. The timers are programmed to a particularvalue and count down each time a timing pulse is received. When aparticular timer decrements from 1 to 0, an interrupt is generated. Thecounter can be programmed to automatically restart the count, or waituntil re-programmed by the CPU. At any time the status of the countercan be read from GenCntValue, or can be reset by writing to GenCntValueregister. The auto-restart is activated by setting the GenCntAutoregister, when activated the counter restarts at GenCntStartValue. Acounter can be stopped or started at any time, without affecting thecontents of the GenCntValue register, by writing a 1 or 0 to therelevent GenCntEnable register.

[1908] 15.4 Implementation

[1909] 15.4.1 Definitions of I/O TABLE 91 Timers block I/O definitionPort name Pins I/O Description Clocks and Resets Pclk 1 In System Clockprst_n 1 In System reset, synchronous active low tim_pulse[2:0] 3 OutTimers block generated timing pulses, each one pclk wide 0 - Nominal 1μs pulse 1 - Nominal 100 μs pulse 2 - Nominal 10 ms pulse CPU interfacecpu_adr[6:2] 5 In CPU address bus. Only 5 bits are required to decodethe address space for the ICU block cpu_dataout[31:0] 32 In Shared writedata bus from the CPU tim_cpu_data[31:0] 32 Out Read data bus to the CPUcpu_rwn 1 In Common read/not-write signal from the CPU cpu_tim_sel 1 InBlock select from the CPU. When cpu_tim_sel is high both cpu_adr andcpu_dataout are valid tim_cpu_rdy 1 Out Ready signal to the CPU. Whentim_cpu_rdy is high it indicates the last cycle of the access. For awrite cycle this means cpu_dataout has been registered by the TIM blockand for a read cycle this means the data on tim_cpu_data is valid.tim_cpu_berr 1 Out Bus error signal to the CPU indicating an invalidaccess. cpu_acode[1:0] 2 In CPU Access Code signals. These decode asfollows: 00 - User program access 01 - User data access 10 - Supervisorprogram access 11 - Supervisor data access tim_cpu_debug_valid 1 OutDebug Data valid on tim_cpu_data bus. Active high Miscellaneoustim_icu_wd_irq 1 Out Watchdog timer interrupt signal to the ICU blocktim_icu_irq[2:0] 3 Out Generic timer interrupt signals to the ICU blocktim_cpr_reset_n 1 Out Watch dog timer system reset.

[1910] 15.4.2 Timers Sub-Block Partition

[1911] 15.4.3 Watchdog Timer

[1912] The watchdog timer counts down from pre-programmed value, andgenerates a system wide reset when equal to one. When the counter passesa pre-programmed threshold (wdog_tim_thres) value an interrupt isgenerated (tim_icu_wd_irq) requesting the CPU to update the counter.Setting the counter to zero disables the watchdog reset. In supervisormode the watchdog counter can be written to or read from at any time, inuser mode access is denied. Any accesses in user mode will generate abus error. The counter logic is given by if (wdog_wen = = 1) thenwdog_tim_cnt = write_data // load new data elsif ( wdog_tim_cnt = = 0)then wdog_tim_cnt = wdog_tim_cnt // count disabled elsif ( cnt_en = = 1) then wdog_tim_cnt− − else wdog_tim_cnt = wdog_tim_cnt The timer decodelogic is if (( wdog_tim_cnt = = wdog_tim_thres) AND (wdog_tim_cnt != 0)AND (cnt_en = = 1)) then tim_icu_wd_irq = 1 else tim_icu_wd_irq = 0 //reset generator logic if (wdog_tim_cnt = = 1) AND (cnt_en = = 1) thentim_cpr_reset_n = 0 else tim_cpr_reset_n = 1

[1913] 15.4.4 Generic Timers

[1914] The generic timers block consists of 3 identical counters. Atimer is set to a pre-configured value (GenCntStartValue) and countsdown once per selected timing pulse (gen_unit_sel). The timer can beenabled or disabled at any time (gen_tim_en), when disabled the counteris stopped but not cleared. The timer can be set to automaticallyrestart (gen_tim_auto) after it generates an interrupt. In supervisormode a timer can be written to or read from at any time, in user modeaccess is determined by the GenCntUserModeEnable register settings. Thecounter logic is given by if (gen_wen = = 1) then gen_tim_cnt =write_data elsif (( cnt_en = = 1 )AND (gen_tim_en = = 1 )) then if (gen_tim_cnt = = 1) OR ( gen_tim_cnt = = 0) then // counter may needre-starting if (gen_tim_auto = = 1) then gen_tim_cnt =gen_tim_cnt_st_value else gen_tim_cnt = 0 // hold count at zero elsegen_tim_cnt− − else gen_tim_cnt = gen_tim_cnt The decode logic is if(gen_tim_cnt = = 1)AND ( cnt_en = = 1 )AND (gen_tim_en = = 1 ) thentim_icu_irq = 1 else tim_icu_irq = 0

[1915] 15.4.5 Timing Pulse Generator

[1916] The timing pulse generator contains a general free running 64-bittimer and 3 timing pulse generators producing timing pulses of one cycleduration with a programmable period. The period is programmed by changedthe TimerStartValue registers, but have a nominal starting period of 1μs, 100 μs and 1 ms. In supervisor mode the free running timer registercan be written to or read from at any time, in user mode access isdenied. The status of each of the timers can be read by accessing thePulseTimerStatus registers in supervisor mode. Any accesses in user modewill result in a bus error.

[1917] 15.4.5.1 Free Run Timer

[1918] The increment logic block increments the timer count on eachclock cycle. The counter wraps around to zero and continues incrementingif overflow occurs. When the timing register (FreeRunCount) is writtento, the configuration registers block will set the free_run_wen high fora clock cycle and the value on write_data will become the new countvalue. If free_run_wen[1] is 1 the higher 32 bits of the counter will bewritten to, otherwise if free_run_wen[0] the lower 32 bits are writtento. It is the responsibility of software to handle these writes in asensible manner.

[1919] The increment logic is given by if (free_run_wen[1] = = 1) thenfree_run_cnt[63:32] = write_data elsif (free run wen[0] = = 1) thenfree_run_cnt[31:0] = write_data else free_run_cnt ++

[1920] 15.4.5.2 Pulse Timers

[1921] The pulse timer logic generates timing pulses of 1 clock cyclelength and programmable period. Nominally they generate pulse periods of1 μs, 100 μs and 1 ms. The logic for timer 0 is given by: // Nominal 1usgenerator if (pulse_0_cnt = = 0 ) then pulse_0_cnt =timer_start_value[0] tim_pulse[0] = 1 else pulse_0_cnt − − tim_pulse[0]= 0

[1922] The logic for timer 1 is given by: // 100us generator if((pulse_1_cnt = = 0) AND (tim_pulse[0] = = 1)) then pulse_1_cnt =timer_start_value[1] tim_pulse[1] = 1 elsif (tim_pulse[0] = = 1) thenpulse_1_cnt −− tim_pulse[1] = 0 else pulse_1_cnt = pulse_1_cnttim_pulse[1] = 0

[1923] The logic for the timer 2 is given by: // 10ms generator if((pulse_2_cnt = = 0 ) AND (tim_pulse[1] = = 1)) then pulse_2_cnt =timer_start_value[2] tim_pulse[2] = 1 elsif (tim_pulse[1] = = 1) thenpulse_2_cnt −− tim_pulse[2] = 0 else pulse_2_cnt = pulse_2_cnttim_pulse[2] = 0

[1924] 15.4.6 Configuration Registers

[1925] The configuration registers in the TIM are programmed via the CPUinterface. Refer to section 11.4.3 on page 69 for a description of theprotocol and timing diagrams for reading and writing registers in theTIM. Note that since addresses in SoPEC are byte aligned and the CPUonly supports 32-bit register reads and writes, the lower 2 bits of theCPU address bus are not required to decode the address space for theTIM. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of tim_pcu_data. Table 92 liststhe configuration registers in the TIM block. TABLE 92 Timers RegisterMap Address TIM_base+ Register #bits Reset Description 0x00WatchDogUnitSel 2 0x0 Specifies the units used for the watchdog timer:0 - Nominal 1 μs pulse 1 - Nominal 100 μs pulse 2 - Nominal 10 ms pulse3 - pclk 0x04 WatchDogTimer 32 0xFFFF_FFFF Specifies the number of unitsto count before watchdog timer triggers. 0x08 WatchDogIntThres 320x0000_0000 Specifies the threshold value below which the watchdog timerissues an interrupt 0x0C-0x10 FreeRunCount[1:0] 2 × 32 0x0000_0000Direct access to the free running counter register. Bus 0 - Access tobits 31-0 Bus 1 - Access to bits 63-32 0x14 to 0x1CGenCntStartValue[2:0] 3 × 32 0x0000_0000 Generic timer counter startvalue, number of units to count before event 0x20 to 0x28GenCntValue[2:0] 3 × 32 0x0000_0000 Direct access to generic timercounter registers 0x2C to 0x34 GenCntUnitSel[2:0] 3 × 2 0x0 Genericcounter unit select. Selects the timing units used with correspondingcounter: 0 - Nominal 1 μs pulse 1 - Nominal 100 μs pulse 2 - Nominal 10ms pulse 3 - pclk 0x38 to 0x40 GenCntAuto[2:0] 3 × 1 0x0 Generic counterauto re-start select. When high timer automatically restarts, otherwisetimer stops. 0x44 to 0x4C GenCntEnable[2:0] 3 × 1 0x0 Generic counterenable. 0 - Counter disabled 1 - Counter enabled 0x50GenCntUserModeEnable 3 0x0 User Mode Access enable to generic timerconfiguration register. When 1 user access is enabled. Bit 0 - Generictimer 0 Bit 1 - Generic timer 1 Bit 2 - Generic timer 2 0x54 to 0x5CTimerStartValue[2:0] 3 × 8 0x7F, Timing pulse generator start value.0x63, Indicates the start value for each 0x63 timing pulse timers. Fortimer 0 the start value specifies the timer period in pclk cycles - 1.For timer 1 the start value specifies the timer period in timer 0intervals - 1. For timer 2 the start value specifies the timer period intimer 1 intervals - 1. Nominally the timers generate pulses at 1 us, 100us and 10 ms intervals respectively. 0x60 DebugSelect[6:2] 5 0x00 Debugaddress select. Indicates the address of the register to report on thetim_cpu_data bus when it is not otherwise being used. Read OnlyRegisters 0x64 PulseTimerStatus 24 0x00 Current pulse timer values, andpulses 7:0 - Timer 0 count 15:8 - Timer 1 count 23:16 - Timer 2 count24 - Timer 0 pulse 25 - Timer 1 pulse 26 - Timer 2 pulse

[1926] 15.4.6.1 Supervisor and User Mode Access

[1927] The configuration registers block examines the CPU access type(cpu_acode signal) and determines if the access is allowed to thatparticular register, based on configured user access registers. If anaccess is not allowed the block will issue a bus error by asserting thetim_cpu_berr signal.

[1928] The timers block is fully accessible in supervisor data mode, allregisters can written to and read from. In user mode access is denied toall registers in the block except for the generic timer configurationregisters that are granted user data access. User data access for ageneric timer is granted by setting corresponding bit in theGenCntUserModeEnable register. This can only be changed in supervisordata mode. If a particular timer is granted user data access then allregisters for configuring that timer will be accessible. For example iftimer 0 is granted user data access the GenCntStartValue[0],GenCntUnitSel[0], GenCntAuto[0], GenCntEnable[0] and GenCntValue[0]registers can all be written to and read from without any restriction.

[1929] Attempts to access a user data mode disabled timer configurationregister will result in a bus error. Table 93 details the access modesallowed for registers in the TIM block. In supervisor data mode allregisters are accessable. All forbidden accesses will result in a buserror (tim_cpu_berr asserted). TABLE 93 TIM supervisor and user accessmodes Register Address Registers Access Permission 0x00 WatchDogUnitSelSupervisor data mode only 0x04 WatchDogTimer Supervisor data mode only0x08 WatchDogIntThres Supervisor data mode only 0x0C-0x10 FreeRunCountSupervisor data mode only 0x14 GenCntStartValue[0]GenCntUserModeEnable[0] 0x18 GenCntStartValue[1] GenCntUserModeEnable[1]0x1C GenCntStartValue[2] GenCntUserModeEnable[2] 0x20 GenCntValue[0]GenCntUserModeEnable[0] 0x24 GenCntValue[1] GenCntUserModeEnable[1] 0x28GenCntValue[2] GenCntUserModeEnable[2] 0x2C GenCntUnitSel[0]GenCntUserModeEnable[0] 0x30 GenCntUnitSel[1] GenCntUserModeEnable[1]0x34 GenCntUnitSel[2] GenCntUserModeEnable[2] 0x38 GenCntAuto[0]GenCntUserModeEnable[0] 0x3C GenCntAuto[1] GenCntUserModeEnable[1] 0x40GenCntAuto[2] GenCntUserModeEnable[2] 0x44 GenCntEnable[0]GenCntUserModeEnable[0] 0x48 GenCntEnable[1] GenCntUserModeEnable[1]0x4C GenCntEnable[2] GenCntUserModeEnable[2] 0x50 GenCntUserModeEnableSupervisor data mode only 0x54-0x5C TimerStartValue[2:0] Supervisor datamode only 0x60 DebugSelect Supervisor data mode only 0x64PulseTimerStatus Supervisor data mode only

[1930] 16 Clocking, Power and Reset (CPR)

[1931] The CPR block provides all of the clock, power enable and resetsignals to the SoPEC device.

[1932] 16.1 Powerdown Modes

[1933] The CPR block is capable of powering down certain sections of theSoPEC device. When a section is powered down (i.e. put in sleep mode) nostate is retained (except the PSS storage), the CPU must re-initializethe section before it can be used again.

[1934] For the purpose of powerdown the SoPEC device is divided intosections: TABLE 94 Powerdown sectioning Section Block Print EnginePipeline PCU Subsystem (Section 0) CDU CFU LBD SFU TE TFU HCU DNC DWULLU PHI CPU-DRAM (Section 1) DRAM CPU/MMU DIU TIM ROM LSS PSS ICU ISISubsystem (Section 2) ISI (SCB) DMA Ctrl (SCB) GPIO USB Subsystem(Section 3) USB (SCB)

[1935] Note that the CPR block is not located in any section. Allconfiguration registers in the CPR block are clocked by an ungateableclock and have special reset conditions.

[1936] 16.1.1 Sleep Mode

[1937] Each section can be put into sleep mode by setting thecorresponding bit in the SleepModeEnable register. To re-enable thesection the sleep mode bit needs to be cleared and then the sectionshould be reset by writing to the relevant bit in the ResetSectionregister. Each block within the section should then be re-configured bythe CPU.

[1938] If the CPU system (section 1) is put into sleep mode, the SoPECdevice will remain in sleep mode until a system level reset is initiatedfrom the reset pin, or a wakeup reset by the SCB block as a result ofactivity on either the USB or ISI bus. The watchdog timer cannot resetthe device as it is in section 1 also, and will be in sleep mode.

[1939] If the CPU and ISI subsystem are in sleep mode only a reset fromthe USB or a hardware reset will re-activate the SoPEC device.

[1940] If all sections are put into sleep mode, then only a system levelreset initiated by the reset pin will re-activate the SoPEC device.

[1941] Like all software resets in SoPEC the ResetSection register isactive-low i e. a 0 should be written to each bit position requiring areset. The ResetSection register is self-reseting.

[1942] 16.1.2 Sleep Mode Powerdown Procedure

[1943] When powering down a section, the section may retain it's currentstate (although not gauranteed to). It is possible when powering back upa section that inconsistencies between interface state machines couldcause incorrect operation. In order to prevent such condition fromhappening, all blocks in a section must be disabled before poweringdown. This will ensure that blocks are restored in a benign state whenpowered back up.

[1944] In the case of PEP section units setting the Go bit to zero willdisable the block. The DRAM subsystem can be effectively disabled bysetting the RotationSync bit to zero, and the SCB system disabled bysetting the DMAAccessEn bits to zero turning off the DMA access to DRAM.Other CPU subsystem blocks without any DRAM access do not need to bedisabled.

[1945] 16.2 Reset Source

[1946] The SoPEC device can be reset by a number of sources. When areset from an internal source is initiated the reset source register(ResetSrc) stores the reset source value. This register can then be usedby the CPU to determine the type of boot sequence required.

[1947] 16.3 Clock Relationship

[1948] The crystal oscillator excites a 32 MHz crystal through thextalin and xtalout pins. The 32 MHz output is used by the PLL to derivethe master VCO frequency of 960 MHz. The master clock is then divided toproduce 320 MHz clock (clk320), 160 MHz clock (clk160) and 48 MHz(clk48) clock sources.

[1949] The phase relationship of each clock from the PLL will bedefined. The relationship of internal clocks clk320, clk48 and clk160 toxtalin will be undefined.

[1950] At the output of the clock block, the skew between each pclkdomain (pclk_section[2:0] and jclk) should be within skew tolerances oftheir respective domains (defined as less than the hold time of a D-typeflip flop).

[1951] The skew between doclk and pclk should also be less than the skewtolerances of their respective domains.

[1952] The usbclk is derived from the PLL output and has no relationshipwith the other clocks in the system and is considered asynchronous.

[1953] 16.4 PLL Control

[1954] The PLL in SoPEC can be adjusted by programming the PLLRangeA,PLLRangeB, PLLTunebits and PLLMult registers. If these registers arechanged by the CPU the values are not updated until the PLLUpdateregister is written to. Writing to the PLLUpdate register triggers thePLL control state machine to update the PLL configuration in a safe way.When an update is active (as indicated by PLLUpdate register) the CPUmust not change any of the configuration registers, doing so could causethe PLL to lose lock indefintely, requiring a hardware reset to recover.Configuring the PLL registers in an inconsistent way can also cause thePLL to lose lock, care must taken to keep the PLL configuration withinspecified parameters.

[1955] The VCO frequency of the PLL is calculated by the number ofdivider in the feedback path. PLL output A is used as the feedbacksource.

VCOfreq=REFCLK×PLLMult×PLLRangeA×External divider

VCOfreq=32×3×10×1=960 Mhz.

[1956] In the default PLL setup, PLLMult is set to 3, PLLRangeA is setto 3 which corresponds to a divide by 10, PLLRangeB is set to 5 whichcorresponds to a divide by 3.

PLLouta=VCOfreq/PLLRangeA=960 Mhz/10=96 Mhz

PLLoutb=VCOfreq/PLLRangeB=960 Mhz/3=320 Mhz

[1957] See [16] for complete PLL setup parameters.

[1958] 16.5 Implementation

[1959] 16.5.1 Definitions of I/O TABLE 95 CPR I/O definition Port namePins I/O Description Clocks and Resets Xtalin 1 In Crystal input, directfrom IO pin. Xtalout 1 Inout Crystal output, direct to IO pin.pclk_section[3:0] 4 Out System clocks for each section Doclk 1 Out Dataout clock (2× pclk) for the PHI block Jclk 1 Out Gated version of systemclock used to clock the JPEG decoder core in the CDU Usbclk 1 Out USBclock, nominally at 48 Mhz jclk_enable 1 In Gating signal for jclk. When1 jclk is enabled reset_n 1 In Reset signal from the reset_n pinusb_cpr_reset_n 1 In Reset signal from the USB block isi_cpr_reset_n 1In Reset signal from the ISI block tim_cpr_reset_n 1 In Reset signalfrom watch dog timer. gpio_cpr_wakeup 1 In SoPEC wake up from the GPIO,active high. prst_n_section[3:0] 4 Out System resets for each section,synchronous active low dorst_n 1 Out Reset for PHI block, synchronous todoclk jrst_n 1 Out Reset for JPEG decoder core in CDU block, synchronousto jclk usbrst_n 1 Out Reset for the USB block, synchronous to usbclkCPU interface cpu_adr[5:2] 3 In CPU address bus. Only 4 bits arerequired to decode the address space for the CPR block cpu_dataout[31:0]32 In Shared write data bus from the CPU cpr_cpu_data[31:0] 32 Out Readdata bus to the CPU cpu_rwn 1 In Common read/not- write signal from theCPU cpu_cpr_sel 1 In Block select from the CPU. When cpu_cpr_sel is highboth cpu_adr and cpu_dataout are valid cpr_cpu_rdy 1 Out Ready signal tothe CPU. When cpr_cpu_rdy is high it indicates the last cycle of theaccess. For a write cycle this means cpu_dataout has been registered bythe block and for a read cycle this means the data on cpr_cpu_data isvalid. cpr_cpu_berr 1 Out Bus error signal to the CPU indicating aninvalid access. cpu_acode[1:0] 2 In CPU Access Code signals. Thesedecode as follows: 00 - User program access 01 - User data access 10 -Supervisor program access 11 - Supervisor data accesscpr_cpu_debug_valid 1 Out Debug Data valid on cpr_cpu_data bus. Activehigh

[1960] 16.5.2 Configuration Registers

[1961] The configuration registers in the CPR are programmed via the CPUinterface. Refer to section 11.4 on page 69 for a description of theprotocol and timing diagrams for reading and writing registers in theCPR. Note that since addresses in SoPEC are byte aligned and the CPUonly supports 32-bit register reads and writes, the lower 2 bits of theCPU address bus are not required to decode the address space for theCPR. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of cpr_pcu_data. Table 96 liststhe configuration registers in the CPR block.

[1962] The CPR block will only allow supervisor data mode accesses (i.e.cpu_acode[1:0]=SUPERVISOR_DATA). All other accesses will result incpr_cpu_berr being asserted. TABLE 96 CPR Register Map AddressCPR_base + Register #bits Reset Description 0x00 SleepModeEnable 40x0^(a) Sleep Mode enable, when high a section of logic is put intopowerdown. Bit 0 - Controls section 0 Bit 1 - Controls section 1 Bit 2 -Controls section 2 Bit 3 - Controls section 3 Note that theSleepModeEnable register has special reset conditions. See Section16.5.6 for details 0x04 ResetSrc 5 0x1^(a) Reset Source register,indicating the source of the last reset (or wake-up) Bit 0 - ExternalReset Bit 1 - USB wakeup reset Bit 2 - ISI wakeup reset Bit 3 - Watchdogtimer reset Bit 4 - GPIO wake-up (Read Only Register) 0x08 ResetSection4 0xF Active-low synchronous reset for each section, self-resetting. Bit0 - Controls section 0 Bit 1 - Controls section 1 Bit 2 - Controlssection 2 Bit 3 - Controls section 3 0x0C DebugSelect[5:2] 4 0x0 Debugaddress select. Indicates the address of the register to report on thecpr_cpu_data bus when it is not otherwise being used. PLL Control 0x10PLLTuneBits 10 0x3BC PLL tuning bits 0x14 PLLRangeA 4 0x3 PLLOUT Afrequency selector (defaults to 60 Mhz to 125 Mhz) 0x18 PLLRangeB 3 0x5PLLOUT B frequency selector (defaults to 200 Mhz to 400 Mhz) 0x1CPLLMultiplier 5 0x03 PLL multiplier selector, defaults to refclk × 30x20 PLLUpdate 1 0x0 PLL update control. A write (of any value) to thisregister will cause the PLL to lose lock for ˜100 us. Reading theregister indicates the status of the update. 0 - PLL update complete 1 -PLL update active No writes to PLLTuneBits, PLLRangeA, PLL- RangeB,PLLMultiplier or PLLUpdate are allowed while the PLL update is active.

[1963] a. Reset value depends on reset source. External reset shown.

[1964] 16.5.3 CPR Sub-Block Partition

[1965] 16.5.4 Reset_n deglitch

[1966] The external reset_n signal is deglitched for about 1 μs. reset_nmust maintain a state for 1 us second before the state is passed intothe rest of the device. All deglitch logic is clocked on bufrefclk.

[1967] 16.5.5 Sync Reset

[1968] The reset synchronizer retimes an asynchronous reset signal tothe clock domain that it resets. The circuit prevents the inactive edgeof reset occurring when the clock is rising

[1969] 16.5.6 Reset Generator Logic

[1970] The reset generator logic is used to determine which clockdomains should be reset, based on configured reset values(reset_section_n), the external reset (reset_n), watchdog timer reset(tim_cpr_reset_n), the USB reset (usb_cpr_reset_n), the GPIO wakeupcontrol (gpio_cpr_wakeup) and the ISI reset (isi_cpr_reset_n). The resetdirect from the IO pin (reset_n) is synchronized and de-glitched beforefeeding the reset logic.

[1971] All resets are lengthened to at least 16 pclk cycles, regardlessof the duration of the input reset. The clock for a particular sectionmust be running for the reset to have an effect. The clocks to eachsection can be enabled/disabled using the SleepModeEnable register.

[1972] Resets from the ISI or USB block reset everything except its ownsection (section 2 or 3). TABLE 97 Reset domains Reset signal Domainreset_dom[0] Section 0 pclk domain (PEP) reset_dom[1] Section 1 pclkdomain (CPU) reset_dom[2] Section 2 pclk domain (ISI) reset_dom[3]Section 3 usbclk/pclk domain (USB) reset_dom[4] doclk domainreset_dom[5] jclk domain

[1973] The logic is given by if (reset_dg_n = = 0) then reset_dom[5:0] =0x00 // reset everything reset_src[4:0] = 0x01 cfg_reset_n = 0sleep_mode_en[3:0] = 0x0 // re-awaken all sections elsif(tim_cpr_reset_n = = 0) then reset_dom[5:0]  = 0x00  // reset everythingexcept CPR config reset_src[4:0] = 0x08 cfg_reset_n = 1 // CPR configstays the same sleep_mode_en[1]  = 0  // re-awaken section 1 only (awakealready) elsif (usb_cpr_reset_n = = 0) then reset_dom[5:0]  = 0x08   //all except USB domain + CPR config reset_src[4:0] = 0x02 cfg_reset_n = 1// CPR config stays the same sleep_mode_en[1]  = 0  // re-awaken section1 only, section 3 is awake elsif (isi cpr_reset_n = = 0) thenreset_dom[5:0]  = 0x04   // all except ISI domain + CPR configreset_src[4:0] = 0x04 cfg_reset n = 1 // CPR config stays the samesleep_mode_en[1]  = 0  // re-awaken section 1 only, section 2 is awakeelsif (gpio_cpr_wakeup = 1) then reset_dom[5:0] = 0x3C // PEP and CPUsections only reset_src[4:0] = 0x10 cfg_reset_n = 1 // CPR config staysthe same sleep_mode_en[1]  = 0  // re-awaken section 1 only, section 2is awake else // propagate resets from reset section registerreset_dom[5:0] = 0x3F // default to on cfg_reset_n = 1 // CPR cfgregisters are not in any section sleep_mode_en[3:0]  =sleep_mode_en[3:0] // stay the same by default if (reset_section_n[0]= = 0) then reset_dom[5] = 0   // jclk domain reset_dom[4] = 0   //doclk domain reset_dom[0] = 0   // pclk section 0 domain if(reset_section_n[1] = = 0) then reset_dom[1] = 0   // pclk section 1domain if (reset_section_n[2] = = 0) then reset_dom[2] = 0 // pclksection 2 domain (ISI) if (reset_section_n[3] = = 0) then reset_dom[3] =0   // USB domain

[1974] 16.5.7 Sleep Logic

[1975] The sleep logic is used to generate gating signals for each ofSoPECs clock domains. The gate enable (gate_dom) is generated based onthe configured sleep_mode_en and the internally generated jclk_enablesignal.

[1976] The logic is given by // clock gating for sleep modesgate_dom[5:0] = 0x0  // default to all clocks on if (sleep_mode_en[0]= = 1) then  // section 0 sleep gate_dom[0] = 1 // pclk section 0gate_dom[4] = 1 // doclk domain gate_dom[5] = 1 // jclk domain if(sleep_mode_en[1] = = 1) then  // section 1 sleep gate_dom[1] = 1 //pclk section 1 if (sleep_mode_en[2] = = 1) then  // section 2 sleepgate_dom[2] = 1 // pclk section 2 if (sleep_mode_en[3] = = 1) then  //section 3 sleep gate_dom[3] = 1 // usb section 3 // the jclk can beturned off by CDU signal if (jclk_enable = = 0) then gate_dom[5] = 1

[1977] The clock gating and sleep logic is clocked with the master_pclkclock which is not gated by this logic, but is synchronous to otherpclk_section and jclk domains.

[1978] Once a section is in sleep mode it cannot generate a reset torestart the device. For example if section 1 is in sleep mode then thewatchdog timer is effectively disabled and cannot trigger a reset.

[1979] 16.5.8 Clock Gate Logic

[1980] The clock gate logic is used to safely gate clocks withoutgenerating any glitches on the gated clock. When the enable is high theclock is active otherwise the clock is gated

[1981] 16.5.9 Clock Generator Logic

[1982] The clock generator block contains the PLL, crystal oscillator,clock dividers and associated control logic. The PLL VCO frequency is at960 MHz locked to a 32 MHz refclk generated by the crystal oscillator.In test mode the xtalin signal can be driven directly by the test clockgenerator, the test clock will be reflected on the refclk signal to thePLL.

[1983] 16.5.9.1 Clock Divider A

[1984] The clock divider A block generates the 48 MHz clock from theinput 96 MHz clock (pllouta) generated by the PLL. The divider isenabled only when the PLL has acquired lock.

[1985] 16.5.9.2 Clock Divider B

[1986] The clock divider B block generates the 160 MHz clocks from theinput 320 MHz clock (plloutb) generated by the PLL. The divider isenabled only when the PLL has acquired lock.

[1987] 16.5.9.3 PLL Control State Machine

[1988] The PLL will go out of lock whenever pll_reset goes high (the PLLreset is the only active high reset in the device) or if theconfiguration bits pll_rangea, pll_rangeb, pll_mult, pll_tune arechanged. The PLL control state machine ensures that the rest of thedevice is protected from glitching clocks while the PLL is being resetor it's configuration is being changed.

[1989] In the case of a hardware reset (the reset is deglitched), thestate machine first disables the output clocks (via the clk_gatesignal), it then holds the PLL in reset while its configuration bits arereset to default values. The state machine then releases the PLL resetand waits approx. 100 us to allow the PLL to regain lock. Once the locktime has elapsed the state machine re-enables the output clocks andresets the remainder of the device via the reset_dg_n signal.

[1990] When the CPU changes any of the configuration registers it mustwrite to the PLLupdate register to allow the state machine to update thePLL to the new configuration setup. If a PLLUpdate is detected the statemachine first gates the output clocks. It then holds the PLL in resetwhile the PLL configuration registers are updated. Once updated the PLLreset is released and the state machine waits approx 100 us for the PLLto regain lock before re-enabling the output clocks. Any write to thePLLUpdate register will cause the state machine to perform the updateoperation regardless of whether the configuration values changed or not.

[1991] All logic in the clock generator is clocked on bufrefclk which isalways an active clock regardless of the state of the PLL.

[1992] 17 ROM Block

[1993] 17.1 Overview

[1994] The ROM block interfaces to the CPU bus and contains the SoPECboot code. The ROM block consists of the CPU bus interface, the ROMmacro and the ChipID macro. The current ROM size is 16 KBytesimplemented as a 4096×32 macro. Access to the ROM is not cached becausethe CPU enjoys fast (no more than one cycle slower than a cache access),unarbitrated access to the ROM. Each SoPEC device is required to have aunique ChipID which is set by blowing fuses at manufacture. IBM's 300 mmECID macro and a custom 112-bit ECID macro are used to implement theChipID offering 224-bits of laser fuses. The exact number of fuse bitsto be used for the ChipID will be determined later but all bits are madeavailable to the CPU. The ECID macros allows all 224 bits to be read outin parallel and the ROM block will make all 224 bits available in theFuseChipID[N] registers which are readable by the CPU in supervisor modeonly.

[1995] 17.2 Boot Operation

[1996] The are two boot scenarIOs for the SoPEC device namely afterpower-on and after being awoken from sleep mode. When the device is insleep mode it is hoped that power will actually be removed from theDRAM, CPU and most other peripherals and so the program code will needto be freshly downloaded each time the device wakes up from sleep mode.In order to reduce the wakeup boot time (and hence the perceived printlatency) certain data items are stored in the PSS block (see section18). These data items include the SHA-1 hash digest expected for theprogram(s) to be downloaded, the master/slave SoPEC id and someconfiguration parameters. All of these data items are stored in the PSSby the CPU prior to entering sleep mode. The SHA-1 value stored in thePSS is calculated by the CPU by decrypting the signature of thedownloaded program using the appropriate public key stored in ROM. Thiscompute intensive decryption only needs to take place once as part ofthe power-on boot sequence—subsequent wakeup boot sequences will simplyuse the resulting SHA-1 digest stored in the PSS. Note that the digestonly needs to be stored in the PSS before entering sleep mode and thePSS can be used for temporary storage of any data at all other times.

[1997] The CPU is expected to be in supervisor mode for the entire bootsequence described by the pseudocode below. Note that the boot sequencehas not been finalised but is expected to be close to the following: if(ResetSrc = = 1) then // Reset was a power-on reset configure_sopec //need to configure peris (USB, ISI, DMA, ICU etc.) // Otherwise reset wasa wakeup reset so peris etc. were already configured PAUSE: wait untilIrqSemaphore != 0 // i.e. wait until an interrupt has been serviced if(IrqSemaphore = = DMAChan0Msg) then parse_msg(DMAChan0MsgPtr) // thisroutine will parse the message and take any // necessary action e.g.programming the DMAChannel1 registers elsif (IrqSemaphore = =DMAChan1Msg) then // program has been downloaded CalculatedHash =gen_sha1(ProgramLocn, ProgramSize) if (ResetSrc = = 1) then ExpectedHash= sig_decrypt(ProgramSig, public_key) else ExpectedHash = PSSHash if(ExpectedHash = = CalculatedHash) then jmp(PrgramLocn) // transfercontrol to the downloaded program else send_host_msg(“ProgramAuthentication Failed”) goto PAUSE: elsif (IrqSemaphore = = timeout)then // nothing has happened if (ResetSrc = = 1) then sleep mode( ) //put SoPEC into sleep mode to be woken up by USB/ISI activity else // wewere woken up but nothing happened reset_sopec(PowerOnReset) else gotoPAUSE

[1998] The boot code places no restrictions on the activity of anyprograms downloaded and authenticated by it other than those imposed bythe configuration of the MMU i.e. the principal function of the bootcode is to authenticate that any programs downloaded by it are from atrusted source. It is the responsibility of the downloaded program toensure that any code it downloads is also authenticated and that thesystem remains secure. The downloaded program code is also responsiblefor setting the SoPEC ISIId (see section 12.5 for a description of theISIID) in a multi-SoPEC system. See the “SoPEC Security Overview”document [9] for more details of the SoPEC security features.

[1999] 17.3 Implementation

[2000] 17.3.1 Definitions of I/O TABLE 98 ROM Block I/O Port name PinsI/O Description Clocks and Resets prst_n 1 In Global reset. Synchronousto pclk, active low. Pclk 1 In Global clock CPU Interface cpu_adr[14:2]13 In CPU address bus. Only 13 bits are required to decode the addressspace for this block. rom_cpu_data[31:0] 32 Out Read data bus to the CPUcpu_rwn 1 In Common read/ not-write signal from the CPU cpu_acode[1:0] 2In CPU Access Code signals. These decode as follows: 00 - User programaccess 01 - User data access 10 - Supervisor program access 11 -Supervisor data access cpu_rom_sel 1 In Block select from the CPU. Whencpu_rom_sel is high cpu_adr is valid rom_cpu_rdy 1 Out Ready signal tothe CPU. When rom_cpu_rdy is high it indicates the last cycle of theaccess. For a read cycle this means the data on rom_cpu_data is valid.rom_cpu_berr 1 Out ROM bus error signal to the CPU indicating an invalidaccess.

[2001] 17.3.2 Configuration Registers

[2002] The ROM block will only allow read accesses to the FuseChipIDregisters and the ROM with supervisor data space permissions (i.e.cpu_acode[1:0]=11). Write accesses with supervisor data spacepermissions

[2003] will have no effect. All other accesses with will result inrom_cpu_berr being asserted. The CPU subsystem bus slave interface isdescribed in more detail in section 9.4.3. TABLE 99 ROM Block RegisterMap Address ROM_base + Register #bits Reset Description 0x4000FuseChipID0 32 n/a Value of corresponding fuse bits 31 to 0 of the IBM112-bit ECID macro. (Read only) 0x4004 FuseChipID1 32 n/a Value ofcorresponding fuse bits 63 to 32 of the IBM 112-bit ECID macro. (Readonly) 0x4008 FuseChipID2 32 n/a Value of corresponding fuse bits 95 to64 of the IBM 112-bit ECID macro. (Read only) 0x400C FuseChipID3 16 n/aValue of corresponding fuse bits 111 to 96 of the IBM 112- bit ECIDmacro. (Read only) 0x4010 FuseChipID4 32 n/a Value of corresponding fusebits 31 to 0 of the Custom 112-bit ECID macro. (Read only) 0x4014FuseChipID5 32 n/a Value of corresponding fuse bits 63 to 32 of theCustom 112-bit ECID macro. (Read only) 0x4018 FuseChipID6 32 n/a Valueof corresponding fuse bits 95 to 64 of the Custom 112-bit ECID macro.(Read only) 0x401C FuseChipID7 16 n/a Value of corresponding fuse bits111 to 96 of the Custom 112-bit ECID macro. (Read only)

[2004] 17.3.3 Sub-Block Partition

[2005] IBM offer two variants of their ROM macros; A high performanceversion (ROMHD) and a low power version (ROMLD). It is likely that thelow power version will be used unless some implementation issue requiresthe high performance version. Both versions offer the same bit density.The sub-block partition diagram below does not include the clocking andtest signals for the ROM or ECID macros. The CPU subsystem bus interfaceis described in more detail in section 11.4.3.

[2006] 17.3.4 TABLE 100 ROM Block internal signals Port name WidthDescription Clocks and Resets prst_n 1 Global reset. Synchronous topclk, active low. Pclk 1 Global clock Internal Signals rom_adr[11:0] 12ROM address bus rom_sel 1 Select signal to the ROM macro instructing itto access the location at rom_adr rom_oe 1 Output enable signal to theROM block rom_data[31:0] 32 Data bus from the ROM macro to the CPU businterface rom_dvalid 1 Signal from the ROM macro indicating that thedata on rom_data is valid for the address on rom_adr fuse_data[31:0] 32Data from the FuseChipID [N] register addressed by fuse_reg_adrfuse_reg_adr[2:0] 3 Indicates which of the FuseChipID registers is beingaddressed

[2007] Sub-Block Signal Definition

[2008] 18 Power Safe Storage (PSS) Block

[2009] 18.1 Overview

[2010] The PSS block provides 128 bytes of storage space that willmaintain its state when the rest of the SoPEC device is in sleep mode.The PSS is expected to be used primarily for the storage of decryptedsignatures associated with downloaded programmed code but it can also beused to store any information that needs to survive sleep mode (e.g.configuration details). Note that the signature digest only needs to bestored in the PSS before entering sleep mode and the PSS can be used fortemporary storage of any data at all other times.

[2011] Prior to entering sleep mode the CPU should store all of theinformation it will need on exiting sleep mode in the PSS. On emergingfrom sleep mode the boot code in ROM will read the ResetSrc register inthe CPR block to determine which reset source caused the wakeup. Thereset source information indicates whether or not the PSS contains validstored data, and the PSS data determines the type of boot sequence toexecute. If for any reason a full power-on boot sequence should beperformed (e.g. the printer driver has been updated) then this is simplyachieved by initiating a full software reset.

[2012] Note that a reset or a powerdown (powerdown is implemented byclock gating) of the PSS block will not clear the contents of the 128bytes of storage. If clearing of the PSS storage is required, then theCPU must write to each location individually.

[2013] 18.2 Implementation

[2014] The storage area of the PSS block will be implemented as a128-byte register array. The array is located from PSS_base through toPSS_base+0x7F in the address map. The PSS block will only allow read orwrite accesses with supervisor data space permissions (i.e.cpu_acode[1:0]=11). All other accesses will result in pss_cpu_berr beingasserted. The CPU subsystem bus slave interface is described in moredetail in section 11.4.3.

[2015] 18.2.1 Definitions of I/O TABLE 101 PSS Block I/O Port name PinsI/O Description Clocks and Resets prst_n 1 In Global reset. Synchronousto pclk, active low. Pclk 1 In Global clock CPU Interface cpu_adr[6:2] 5In CPU address bus. Only 5 bits are required to decode the address spacefor this block. cpu_dataout[31:0] 32 In Shared write data bus from theCPU pss_cpu_data[31:0] 32 Out Read data bus to the CPU cpus_rwn 1 InCommon read/not-write signal from the CPU cpu_acode[1:0] 2 In CPU AccessCode signals. These decode as follows: 00 - User program access 01 -User data access 10 - Supervisor program access 11 - Supervisor dataaccess cpu_pss_sel 1 In Block select from the CPU. When cpu_pss_sel ishigh both cpu_adr and cpu_dataout are valid pss_cpu_rdy 1 Out Readysignal to the CPU. When pss_cpu_rdy is high it indicates the last cycleof the access. For a read cycle this means the data on pss_cpu_data isvalid. pss_cpu_berr 1 Out PSS bus error signal to the CPU indicating aninvalid access.

[2016] 19 Low Speed Serial Interface (LSS)

[2017] 19.1 Overview

[2018] The Low Speed Serial Interface (LSS) provides a mechanism for theinternal SoPEC CPU to communicate with external QA chips via twoindependent LSS buses. The LSS communicates through the GPIO block tothe QA chips. This allows the QA chip pins to be reused in multi-SoPECenvironments. The LSS Master system-level interface is illustrated inFIG. 75. Note that multiple QA chips are allowed on each LSS bus.

[2019] 19.2 QA Communication

[2020] The SoPEC data interface to the QA Chips is a low speed, 2 pin,synchronous serial bus. Data is transferred to the QA chips via thelss_data pin synchronously with the lss_clk pin. When the lss_clk ishigh the data on lss_data is deemed to be valid. Only the LSS master inSoPEC can drive the lss_clk pin, this pin is an input only to the QAchips. The LSS block must be able to interface with an open-collectorpull-up bus. This means that when the LSS block should transmit alogical zero it will drive 0 on the bus, but when it should transmit alogical 1 it will leave high-impedance on the bus (i.e. it doesn't drivethe bus). If all the agents on the LSS bus adhere to this protocol thenthere will be no issues with bus contention.

[2021] The LSS block controls all communication to and from the QAchips. The LSS block is the bus master in all cases. The LSS blockinterprets a command register set by the SoPEC CPU, initiatestransactions to the QA chip in question and optionally accepts returndata. Any return information is presented through the configurationregisters to the SoPEC CPU. The LSS block indicates to the CPU thecompletion of a command or the occurrence of an error via an interrupt.The LSS protocol can be used to communicate with other LSS slave devices(other than QA chips). However should a LSS slave device hold the clocklow (for whatever reason), it will be in violation of the LSS protocoland is not supported. The LSS clock is only ever driven by the LSSmaster.

[2022] 19.2.1 Start and Stop Conditions

[2023] All transmissions on the LSS bus are initiated by the LSS masterissuing a STAR_(T) condition and terminated by the LSS master issuing aSTOP condition. STAR_(T) and STOP conditions are always generated by theLSS master. As illustrated in FIG. 76, a STAR_(T) condition correspondsto a high to low transition on lss_data while lss_clk is high. A STOPcondition corresponds to a low to high transition on lss_data whilelss_clk is high.

[2024] 19.2.2 Data Transfer

[2025] Data is transferred on the LSS bus via a byte orientatedprotocol. Bytes are transmitted serially. Each byte is sent mostsignificant bit (MSB) first through to least significant bit (LSB) last.One clock pulse is generated for each data bit transferred. Each bytemust be followed by an acknowledge bit.

[2026] The data on the lss_data must be stable during the HIGH period ofthe lss_clk clock. Data may only change when lss_clk is low. Atransmitter outputs data after the falling edge of lss_clk and areceiver inputs the data at the rising edge of lss_clk. This data isonly considered as a valid data bit at the next lss_clk falling edgeprovided a STAR_(T) or STOP is not detected in the period before thenext lss_clk falling edge. All clock pulses are generated by the LSSblock. The transmitter releases the lss_data line (high) during theacknowledge clock pulse (ninth clock pulse). The receiver must pull downthe lss_data line during the acknowledge clock pulse so that it remainsstable low during the HIGH period of this clock pulse.

[2027] Data transfers follow the format shown in FIG. 77. The first bytesent by the LSS master after a STAR_(T) condition is a primary id byte,where bits 7-2 form a 6-bit primary id (0 is a global id and willaddress all QA Chips on a particular LSS bus), bit 1 is an even paritybit for the primary ID, and bit 0 forms the read/write sense. Bit 0 ishigh if the following command is a read to the primary id given or lowfor a write command to that id. An acknowledge is generated by the QAchip(s) corresponding to the given id (if such a chip exists) by drivingthe lss_data line low synchronous with the LSS master generated ninthlss_clk.

[2028] 19.2.3 Write Procedure

[2029] The protocol for a write access to a QA Chip over the LSS bus isillustrated in FIG. 79 below. The LSS master in SoPEC initiates thetransaction by generating a STAR_(T) condition on the LSS bus. It thentransmits the primary id byte with a 0 in bit 0 to indicate that thefollowing command is a write to the primary id. An acknowledge isgenerated by the QA chip corresponding to the given primary id. The LSSmaster will clock out M data bytes with the slave QA Chip acknowledgingeach successful byte written. Once the slave QA chip has acknowledgedthe M^(th) data byte the LSS master issues a STOP condition to completethe transfer. The QA chip gathers the M data bytes together andinterprets them as a command. See QA Chip Interface Specification formore details on the format of the commands used to communicate with theQA chip[8]. Note that the QA chip is free to not acknowledge any bytetransmitted. The LSS master should respond by issuing an interrupt tothe CPU to indicate this error. The CPU should then generate a STOPcondition on the LSS bus to gracefully complete the transaction on theLSS bus.

[2030] 19.2.4 Read Procedure

[2031] The LSS master in SoPEC initiates the transaction by generating aSTAR_(T) condition on the LSS bus. It then transmits the primary id bytewith a 1 in bit 0 to indicate that the following command is a read tothe primary id. An acknowledge is generated by the QA chip correspondingto the given primary id. The LSS master releases the lss_data bus andproceeds to clock the expected number of bytes from the QA chip with theLSS master acknowledging each successful byte read. The last expectedbyte is not acknowledged by the LSS master. It then completes thetransaction by generating a STOP condition on the LSS bus. See QA ChipInterface Specification for more details on the format of the commandsused to communicate with the QA chip[8].

[2032] 19.3 Implementation

[2033] A block diagram of the LSS master is given in FIG. 80. Itconsists of a block of configuration registers that are programmed bythe CPU and two identical LSS master units that generate the signallingprotocols on the two LSS buses as well as interrupts to the CPU. The CPUinitiates and terminates transactions on the LSS buses by writing anappropriate command to the command register, writes bytes to betransmitted to a buffer and reads bytes received from a buffer, andchecks the sources of interrupts by reading status registers.

[2034] 19.3.1 Definitions of IO TABLE 102 LSS IO pins definitions Portname Pins I/O Description Clocks and Resets Pclk 1 In System Clockprst_n 1 In System reset, synchronous active low CPU Interface cpu_rwn 1In Common read/not-write signal from the CPU cpu_adr[6:2] 5 In CPUaddress bus. Only 5 bits are required to decode the address space forthis block cpu_dataout[31:0] 32 In Shared write data bus from the CPUcpu_acode[1:0] 2 In CPU access code signals. cpu_acode[0] - Program(0)/Data (1) access cpu_acode[1] - User (0)/Supervisor (1) accesscpu_lss_sel 1 In Block select from the CPU. When cpu_lss_sel is highboth cpu_adr and cpu_dataout are valid lss_cpu_rdy 1 Out Ready signal tothe CPU. When lss_cpu_rdy is high it indicates the last cycle of theaccess. For a write cycle this means cpu_dataout has been registered bythe LSS block and for a read cycle this means the data on lss_cpu_datais valid. lss_cpu_berr 1 Out LSS bus error signal to the CPU.lss_cpu_data[31:0] 32 Out Read data bus to the CPU lss_cpu_debug_valid 1Out Active high. Indicates the presence of valid debug data onlss_cpu_data. GPIO for LSS buses lss_gpio_dout[1:0] 2 Out LSS bus dataoutput Bit 0 - LSS bus 0 Bit 1 - LSS bus 1 gpio_lss_din[1:0] 2 In LSSbus data input Bit 0 - LSS bus 0 Bit 1 - LSS bus 1 lss_gpio_e[1:0] 2 OutLSS bus data output enable, active high Bit 0 - LSS bus 0 Bit 1 - LSSbus 1 lss_gpio_clk[1:0] 2 Out LSS bus clock output Bit 0 - LSS bus 0 Bit1 - LSS bus 1 ICU interface lss_icu_irq[1:0] 2 Out LSS interruptrequests Bit 0 - interrupt associated with LSS bus 0 Bit 1 - interruptassociated with LSS bus 1

[2035] 19.3.2 Configuration Registers

[2036] The configuration registers in the LSS block are programmed viathe CPU interface. Refer to section 11.4 on page 69 for the descriptionof the protocol and timing diagrams for reading and writing registers inthe LSS block. Note that since addresses in SoPEC are byte aligned andthe CPU only supports 32-bit register reads and writes, the lower 2 bitsof the CPU address bus are not required to decode the address space forthe LSS block. Table 103 lists the configuration registers in the LSSblock. When reading a register that is less than 32 bits wide zerosshould be returned on the upper unused bit(s) of lss_cpu_data.

[2037] The input cpu_acode signal indicates whether the current CPUaccess is supervisor, user, program or data. The configuration registersin the LSS block can only be read or written by a supervisor dataaccess, i.e. when cpu_acode equals b11. If the current access is asupervisor data access then the LSS responds by asserting lss_cpu_rdyfor a single clock cycle.

[2038] If the current access is anything other than a supervisor dataaccess, then the LSS generates a bus error by asserting lss_cpu_berr fora single clock cycle instead of lss_cpu_rdy as shown in section 11.4 onpage 69. A write access will be ignored, and a read access will returnzero. TABLE 103 LSS Control Registers Address (LSS_base +) Register#bits Reset Description Control registers 0x00 Reset 1 0x1 A write tothis register causes a reset of the LSS. 0x04 LssClockHighLowDuration 160x00C8 Lss_clk has a 50:50 duty cycle, this register defines the periodof lss_clk by means of specifying the duration (in pclk cycles) thatlss_clk is low (or high). The reset value specifies transmission overthe LSS bus at a nominal rate of 400 kHz, corresponding to a low (orhigh) duration of 200 pclk (160 Mhz) cycles. Register should not be setto values less than 8. 0x08 LssClocktoDataHold 6 0x3 Specifies thenumber of pclk cycles that Data must remain valid for after the fallingedge of lss_clk. Minimum value is 3 cycles, and must to programmed to beless than LssClockHighLowDuration. LSS bus 0 registers 0x10Lss0IntStatus 3 0x0 LSS bus 0 interrupt status registers Bit 0 - commandcompleted successfully Bit 1 - error during processing of command, not-acknowledge received after transmission of primary id byte on LSS bus 0Bit 2 - error during processing of command, not -acknowledge receivedafter transmission of data byte on LSS bus 0 All the bits inLss0IntStatus are cleared when the Lss0Cmd register gets written to.(Read only register) 0x14 Lss0CurrentState 4 0x0 Gives the current stateof the LSS bus 0 state machine. (Read only register). (Encoding will bespecified upon state machine implementation) 0x18 Lss0Cmd 21 0x00_0000Command register defining sequence of events to perform on LSS bus 0before interrupting CPU. A write to this register causes all the bits inthe Lss0IntStatus register to be cleared as well as generating alss0_new_cmd pulse. 0x1C - 0x2C Lss0Buffer[4:0] 5 × 32 0x0000_0000 LSSData buffer. Should be filled with transmit data before transmitcommand, or read data bytes received after a valid read command. LSS bus1 registers 0x30 Lss1IntStatus 3 0x0 LSS bus 1 interrupt statusregisters Bit 0 - command completed successfully Bit 1 - error duringprocessing of command, not -acknowledge received after transmission ofprimary id byte on LSS bus 1 Bit 2 - error during processing of command,not -acknowledge received after transmission of data byte on LSS bus 1All the bits in Lss1IntStatus are cleared when the Lss1Cmd register getswritten to. (Read only register) 0x34 Lss1CurrentState 4 0x0 Gives thecurrent state of the LSS bus 1 state machine. (Read only register)(Encoding will be specified upon state machine implementation) 0x38Lss1Cmd 21 0x00_0000 Command register defining sequence of events toperform on LSS bus 1 before interrupting CPU. A write to this registercauses all the bits in the Lss1IntStatus register to be cleared as wellas generating a lss1_new_cmd pulse. 0x3C - 0x4C Lss1Buffer[4:0] 5 × 320x0000_0000 LSS Data buffer. Should be filled with transmit data beforetransmit command, or read data bytes received after a valid readcommand. Debug registers 0x50 LssDebugSel[6:2] 5 0x00 Selects registerfor debug output. This value is used as the input to the register decodelogic instead of cpu_adr[6:2] when the LSS block is not being accessedby the CPU, i.e. when cpu_lss_sel is 0. The output lss_cpu_debug_validis asserted to indicate that the data on lss_cpu_data is valid debugdata. This data can be multiplexed onto chip pins during debug mode.

[2039] 19.3.2.1 LSS Command Registers

[2040] The LSS command registers define a sequence of events to performon the respective LSS bus before issuing an interrupt to the CPU. Thereis a separate command register and interrupt for each LSS bus. Theformat of the command is given in Table 104. The CPU writes to thecommand register to initiate a sequence of events on an LSS bus. Oncethe sequence of events has completed or an error has occurred, aninterrupt is sent back to the CPU.

[2041] Some example commands are:

[2042] a single STAR_(T) condition (Start=1, IdByteEnable=0,RdWrEnable=0, Stop=0)

[2043] a single STOP condition (Start=0, IdByteEnable=0, RdWrEnable=0,Stop=1)

[2044] a STAR_(T) condition followed by transmission of the id byte(Start=1, IdByteEnable=1, RdWrEnable=0, Stop=0, IdByte contains primaryid byte)

[2045] a write transfer of 20 bytes from the data buffer (Start=0,IdByteEnable=0, RdWrEnable=1, RdWrSense=0, Stop=0, TxRxByteCount=20)

[2046] a read transfer of 8 bytes into the data buffer (Start=0,IdByteEnable=0, RdWrEnable=1, RdWrSense=1, ReadNack=0, Stop=0,TxRxByteCount=8)

[2047] a complete read transaction of 16 bytes (Start=1, IdByteEnable=1,RdWrEnable=1, RdWrSense=1, ReadNack=1, Stop=1, IdByte contains primaryid byte, TxRxByteCount=16), etc.

[2048] The CPU can thus program the number of bytes to be transmitted orreceived (up to a maximum of 20) on the LSS bus before it getsinterrupted. This allows it to insert arbitrary delays in a transfer ata byte boundary. For example the CPU may want to transmit 30 bytes to aQA chip but insert a delay between the 20^(th) and 21^(st) bytes sent.It does this by first writing 20 bytes to the data buffer. It thenwrites a command to generate a STAR_(T) condition, send the primary idbyte and then transmit the 20 bytes from the data buffer. Wheninterrupted by the LSS block to indicate successful completion of thecommand the CPU can then write the remaining 10 bytes to the databuffer. It can then wait for a defined period of time before writing acommand to transmit the 10 bytes from the data buffer and generate aSTOP condition to terminate the transaction over the LSS bus.

[2049] An interrupt to the CPU is generated for one cycle when any bitin LssNIntStatus is set. The CPU can read LssNIntStatus to discover thesource of the interrupt. The LssNIntStatus registers are cleared whenthe CPU writes to the LssNCmd register. A null command write to theLssNCmd register will cause the LssNIntStatus registers to clear and nonew command to start. A null command is defined as Start, IdbyteEnable,RdWrEnable and Stop all set to zero. TABLE 104 LSS command registerdescription bit(s) name Description 0 Start When 1, Issue a STARTcondition on the LSS bus. 1 IdByteEnable ID byte transmit enable: 1 -transmit byte in IdByte field 0 - ignore byte in IdByte field 2RdWrEnable Read/write transfer enable: 0 - ignore settings of RdWrSense,ReadNack and TxRxByteCount 1 - if RdWrSense is 0, then perform a writetransfer of TxRxByteCount bytes from the data buffer. if RdWrSense is 1,then perform a read transfer of TxRxByteCount bytes into the databuffer. Each byte should be acknowledged and the last byte received isacknowledged/not-acknowledged according to the setting of ReadNack. 3RdWrSense Read/write sense indicator: 0 - write 1 - read 4 ReadNackIndicates, for a read transfer, whether to issue an acknowledge or anot-acknowledge after the last byte received (indicated byTxRxByteCount). 0 - Issue acknowledge after last byte received 1 - Issuenot-acknowledge after last byte received. 5 Stop When 1, issue a STOPcondition on the LSS bus. 7:6 reserved Must be 0 15:8  IdByte Byte to betransmitted if IdByteEnable is 1. Bit 8 corresponds to the LSB. 20:16TxRxByteCount Number of bytes to be transmitted from the data buffer orthe number of bytes to be received into the data buffer. The maximumvalue that should be programmed is 20, as the size of the data buffer is20 bytes. Valid values are 1 to 20, 0 is valid when RdWrEnable = 0,other cases are invalid and undefined.

[2050] The data buffer is implemented in the LSS master block. When theCPU writes to the LssNBuffer registers the data written is presented tothe LSS master block via the lssN_buffer_wrdata bus and configurationregisters block pulses the lssN_buffer_wen bit corresponding to theregister written. For example if LssNBuffer[2] is written tolssN_buffer_wen[2] will be pulsed. When the CPU reads the LssNBufferregisters the configuration registers block reflect thelssN_buffer_rdata bus back to the CPU.

[2051] 19.3.3 LSS Master Unit

[2052] The LSS master unit is instantiated for both LSS bus 0 and LSSbus 1. It controls transactions on the LSS bus by means of the statemachine shown in FIG. 83, which interprets the commands that are writtenby the CPU. It also contains a single 20 byte data buffer used fortransmitting and receiving data.

[2053] The CPU can write data to be transmitted on the LSS bus bywriting to the LssNBuffer registers. It can also read data that the LSSmaster unit receives on the LSS bus by reading the same registers. TheLSS master always transmits or receives bytes to or from the data bufferin the same order.

[2054] For a transmit command, LssNBuffer[0][7:0] gets transmittedfirst, then LssNBuffer[0][15:8], LssNBuffer[0][23:16],LssNBuffer[0][31:24], LssNBuffer[1][7:0] and so on until TxRxByteCountnumber of bytes are transmitted. A receive command fills data to thebuffer in the same order. Each new command the buffer start point isreset.

[2055] All state machine outputs, flags and counters are cleared onreset. After a reset the state machine goes to the Reset state andinitialises the LSS pins (lss_clk is set to 1, lss_data is tristated andallowed to be pulled up to 1). When the reset condition is removed thestate machine transitions to the Wait state.

[2056] It remains in the Wait state until lss_new_cmd equals 1. If theStart bit of the command is 0 the state machine proceeds directly to theCheckIdByteEnable state. If the Start bit is 1 it proceeds to theGenerateStart state and issues a STAR_(T) condition on the LSS bus.

[2057] In the CheckIdByteEnable state, if the IdByteEnable bit of thecommand is 0 the state machine proceeds directly to the CheckRdWrEnablestate. If the IdByteEnable bit is 1 the state machine enters theSendIdByte state and the byte in the IdByte field of the command istransmitted on the LSS. The WaitForIdAck state is then entered. If thebyte is acknowledged, the state machine proceeds to the CheckRdWrEnablestate. If the byte is not-acknowledged, the state machine proceeds tothe GenerateInterrupt state and issues an interrupt to indicate anot-acknowledge was received after transmission of the primary id byte.

[2058] In the CheckRdWrEnable state, if the RdWrEnable bit of thecommand is 0 the state machine proceeds directly to the CheckStop state.If the RdWrEnable bit is 1, count is loaded with the value of theTxRxByteCount field of the command and the state machine enters eitherthe ReceiveByte state if the RdWrSense bit of the command is 1 or theTransmitByte state if the RdWrSense bit is 0.

[2059] For a write transaction, the state machine keeps transmittingbytes from the data buffer, decrementing count after each bytetransmitted, until count is 1. If all the bytes are successfullytransmitted the state machine proceeds to the CheckStop state. If theslave QA chip not-acknowledges a transmitted byte, the state machineindicates this error by issuing an interrupt to the CPU and thenentering the GenerateInterrupt state.

[2060] For a read transaction, the state machine keeps receiving bytesinto the data buffer, decrementing count after each byte transmitted,until count is 1. After each byte received the LSS master must issue anacknowledge. After the last expected byte (i.e. when count is 1) thestate machine checks the ReadNack bit of the command to see whether itmust issue an acknowledge or not-acknowledge for that byte. TheCheckStop state is then entered.

[2061] In the CheckStop state, if the Stop bit of the command is 0 thestate machine proceeds directly to the GenerateInterrupt state. If theStop bit is 1 it proceeds to the GenerateStop state and issues a STOPcondition on the LSS bus before proceeding to the GenerateInterruptstate. In both cases an interrupt is issued to indicate successfulcompletion of the command.

[2062] The state machine then enters the Wait state to await the nextcommand. When the state machine reenters the Wait state the output pins(lss_and lss_clk) are not changed, they retain the state of the lastcommand. This allows the possibility of multi-command transactions. TheCPU may abort the current transfer at any time by performing a write tothe Reset register of the LSS block.

[2063] 19.3.3.1 STAR_(T) and STOP Generation

[2064] STAR_(T) and STOP conditions, which signal the beginning and endof data transmission, occur when the LSS master generates a falling andrising edge respectively on the data while the clock is high.

[2065] In the GenerateStart state, lss_gpio_clk is held high withlss_gpio_e remaining deasserted (so the data line is pulled highexternally) for LssClockHighLowDuration pclk cycles. Then lss_gpio_e isasserted and lss_gpio_dout is pulled low (to drive a 0 on the data line,creating a falling edge) with lss_gpio_clk remaining high for anotherLssClockHighLowDuration pclk cycles. In the GenerateStop state, bothlss_gpio_clk and lss_gpio_dout are pulled low followed by the assertionof lss_gpio_e to drive a 0 while the clock is low. AfterLssClockHighLowDuration pclk cycles, lss_gpio_clk is set high. After afurther LssClockHighLowDuration pclk cycles, lss_gpio_e is deasserted torelease the data bus and create a rising edge on the data bus during thehigh period of the clock.

[2066] If the bus is not in the required state for start and stopgeneration (lss_clk=1, lss_data=1 for start, and lss_clk=1, lss_data=0),the state machine moves the bus to the correct state and proceeds asdescribed above. FIG. 82 shows the transition timing from any bus stateto start and stop generation

[2067] 19.3.3.2 Clock Pulse Generation

[2068] The LSS master holds lss_gpio_clk high while the LSS bus isinactive. A clock pulse is generated for each bit transmitted orreceived over the LSS bus. It is generated by first holding lss_gpio_clklow for LssClockHighLowDuration pclk cycles, and then high forLssClockHighLowDuration pclk cycles.

[2069] 19.3.3.3 Data De-Glitching

[2070] When data is received in the LSS block it is passed to ade-glitching circuit. The de-glitch circuit samples the data 3 times onpclk and compares the samples. If all 3 samples are the same then thedata is passed, otherwise the data is ignored.

[2071] Note that the LSS data input on SoPEC is double registered in theGPIO block before being passed to the LSS.

[2072] 19.3.3.4 Data Reception

[2073] The input data, gpio_lss_di, is first synchronised to the pclkdomain by means of two flip-flops clocked by pclk (the double registerresides in the GPIO block). The LSS master generates a clock pulse foreach bit received. The output lss_gpio_e is deassertedLssClockToDataHold pclk cycles after the falling edge of lss_gpio_clk torelease the data bus. The value on the synchronised gpio_lss_di issampled Tstrobe number of clock cycles after the rising edge oflss_gpio_clk (the data is de-glitched over a further 3 stage register toavoid possible glitch detection). See FIG. 84 for further timinginformation.

[2074] In the ReceiveByte state, the state machine generates 8 clockpulses. At each Tstrobe time after the rising edge of lss_gpio_clk thesynchronised gpio_lss_di is sampled. The first bit sampled isLssNBuffer[0][7], the second LssNBuffer[0][6], etc to LssNBuffer[0][0].For each byte received the state machine either sends an NAK or an ACKdepending on the command configuration and the number of bytes received.

[2075] In the SendNack state the state machine generates a single clockpulse. lss_gpio_e is deasserted and the LSS data line is pulled highexternally to issue a not-acknowledge.

[2076] In the SendAck state the state machine generates a single clockpulse. lss_gpio_e is asserted and a 0 driven on lss_gpio_dout afterlss_gpio_clk falling edge to issue an acknowledge.

[2077] 19.3.3.5 Data Transmission

[2078] The LSS master generates a clock pulse for each bit transmitted.Data is output on the LSS bus on the falling edge of lss_gpio_clk.

[2079] When the LSS master drives a logical zero on the bus it willassert lss_gpio_e and drive a 0 on lss_gpio_dout after lss_gpio_clkfalling edge. lss_gpio_e will remain asserted and lss_gpio_dout willremain low until the next lss_clk falling edge.

[2080] When the LSS master drives a logical one lss_gpio_e should bedeasserted at lss_gpio_clk falling edge and remain deasserted at leastuntil the next lss_gpio_clk falling edge. This is because the LSS buswill be externally pulled up to logical one via a pull-up resistor.

[2081] In the SendId byte state, the state machine generates 8 clockpulses to transmit the byte in the IdByte field of the current validcommand. On each falling edge of lss_gpio_clk a bit is driven on thedata bus as outlined above. On the first falling edge IdByte[7] isdriven on the data bus, on the second falling edge IdByte[6] is drivenout, etc.

[2082] In the TransmitByte state, the state machine generates 8 clockpulses to transmit the byte at the output of the transmit FIFO. On eachfalling edge of lss_gpio_clk a bit is driven on the data bus as outlinedabove. On the first falling edge LssNBuffer[0][7] is driven on the databus, on the second falling edge LssNBuffer[0][6] is driven out, etc onto LssNBuffer[0][7] bits.

[2083] In the WaitForAck state, the state machine generates a singleclock pulse. At Tstrobe time after the rising edge of lss_gpio_clk thesynchronized gpio_lss_di is sampled. A 0 indicates an acknowledge andack_detect is pulsed, a 1 indicates a not-acknowledge and nack_detect ispulsed.

[2084] 19.3.3.6 Data Rate Control

[2085] The CPU can control the data rate by setting the clock period ofthe LSS bus clock by programming appropriate value inLssClockHighLowDuration. The default setting for the register is 200(pclk cycles) which corresponds to transmission rate of 400 kHz on theLSS bus (the lss_clk is high for LssClockHighLowDuration cycles then lowfor LssClockHighLowDuration cycles). The lss_clk will always have a50:50 duty cycle. The LssClockHighLowDuration register should not be setto values less than 8.

[2086] The hold time of lss_data after the falling edge of lss_clk isprogrammable by the LssClocktoDataHold register. This register shouldnot be programmed to less than 2 or greater than theLssClockHighLowDuration value.

[2087] 19.3.3.7 LSS Master Timing Parameters

[2088] The LSS master timing parameters are shown in FIG. 84 and theassociated values are shown in Table 105. TABLE 105 LSS master timingparameters Parameter Description min nom max unit LSS Master Driving TpLSS clock period divided by 2 8 200 FFFF pclk cycles Tstart_delay Timeto start data edge from rising Tp + pclk cycles clock edgeLssClocktoDataHold Tstop_delay Time to stop data edge from rising Tp +pclk cycles clock edge LssClocktoDataHold Tdata_setup Time from datasetup to rising clock Tp − 2 − pclk cycles edge LssClocktoDataHoldTdata_hold Time from falling clock edge to data LssClocktoDataHold pclkcycles hold Tack_setup Time that outgoing (N)Ack is setup Tp − 2 − pclkcycles before lss_clk rising edge LssClocktoDataHold Tack_hold Time thatoutgoing (N)Ack is held LssClocktoDataHold pclk cycles after lss_clkfalling edge LSS Master Sampling Tstrobe LSS master strobe point for Tp− 2 Tp − 2 pclk cycles incoming data and (N)Ack values

[2089] DRAM Subsystem

[2090] 20 DRAM Interface Unit (DIU)

[2091] 20.1 Overview

[2092]FIG. 85 shows how the DIU provides the interface between theon-chip 20 Mbit embedded DRAM and the rest of SoPEC. In addition tooutlining the functionality of the DIU, this chapter provides atop-level overview of the memory storage and access patterns of SoPECand the buffering required in the various SoPEC blocks to support thoseaccess requirements.

[2093] The main functionality of the DIU is to arbitrate betweenrequests for access to the embedded DRAM and provide read or writeaccesses to the requesters. The DIU must also implement theinitialisation sequence and refresh logic for the embedded DRAM.

[2094] The arbitration scheme uses a fully programmable timeslotmechanism for non-CPU requesters to meet the bandwidth and latencyrequirements for each unit, with unused slots re-allocated to providebest effort accesses. The CPU is allowed high priority access, giving itminimum latency, but allowing bounds to be placed on its bandwidthconsumption.

[2095] The interface between the DIU and the SoPEC requesters is similarto the interface on PEC1 i.e. separate control, read data and write databusses.

[2096] The embedded DRAM is used principally to store:

[2097] CPU program code and data.

[2098] PEP (re)programming commands.

[2099] Compressed pages containing contone, bi-level and raw tag dataand header information.

[2100] Decompressed contone and bi-level data.

[2101] Dotline store during a print.

[2102] Print setup information such as tag format structures, dithermatrices and dead nozzle information.

[2103] 20.2 IBM Cu-11 Embedded DRAM

[2104] 20.2.1 Single Bank

[2105] SoPEC will use the 1.5 V core voltage option in IBM's 0.13 μmclass Cu-11 process.

[2106] The random read/write cycle time and the refresh cycle time is 3cycles at 160 MHz [16]. An open page access will complete in 1 cycle ifthe page mode select signal is clocked at 320 MHz or 2 cycles if thepage mode select signal is clocked every 160 MHz cycle. The page modeselect signal will be clocked at 160 MHz in SoPEC in order to simplifytiming closure. The DRAM word size is 256 bits.

[2107] Most SoPEC requesters will make single 256 bit DRAM accesses (seeSection 20.4). These accesses will take 3 cycles as they are randomaccesses i.e. they will most likely be to a different memory row thanthe previous access.

[2108] The entire 20 Mbit DRAM will be implemented as a single memorybank. In Cu-11, the maximum single instance size is 16 Mbit. The first 1Mbit tile of each instance contains an area overhead so the cheapestsolution in terms of area is to have only 2 instances. 16 Mbit and 4Mbit instances would together consume an area of 14.63 mm² as would 2times 10 Mbit instances. 4 times 5 Mbit instances would require 17.2mm².

[2109] The instance size will determine the frequency of refresh. Eachrefresh requires 3 clock cycles. In Cu-11 each row consists of 8 columnsof 256-bit words. This means that 10 Mbit requires 5120 rows. A completeDRAM refresh is required every 3.2 ms. Two times 10 Mbit instances wouldrequire a refresh every 100 clock cycles, if the instances are refreshedin parallel.

[2110] The SoPEC DRAM will be constructed as two 10 Mbit instancesimplemented as a single memory bank.

[2111] 20.3 SoPEC Memory Usage Requirements

[2112] The memory usage requirements for the embedded DRAM are shown inTable 106. TABLE 106 Memory Usage Requirements Block Size DescriptionCompressed page 2048 Kbytes Compressed data page store for Bi- storelevel and contone data Decompressed 108 Kbyte 13824 lines with scalefactor 6 = 2304 Contone Store pixels, store 12 lines, 4 colors = 108 kB13824 lines with scale factor 5 = 2765 pixels, store 12 lines, 4 colors= 130 kB Spot line store 5.1 Kbyte 13824 dots/line so 3 lines is 5.1 kBTag Format Structure Typically 12 Kbyte (2.5 mm 55 kB in for 384 dotline tags tags @ 800 dpi) 2.5 mm tags (1/10th inch) @ 1600 dpi require160 dot lines = 160/384 × 55 or 23 kB 2.5 mm tags (1/10th inch) @ 800dpi require 80/384 × 55 = 12 kB Dither Matrix store 4 Kbytes 64 × 64dither matrix is 4 kB 128 × 128 dither matrix is 16 kB 256 × 256 dithermatrix is 64 kB DNC Dead Nozzle 1.4 Kbytes Delta encoded, (10 bit deltaposition + Table 6 dead nozzle mask) × % Dnozzle 5% dead nozzlesrequires (10 + 6) × 692 Dnozzles = 1.4 Kbytes Dot-line store 369.6Kbytes Assume each color row is separated by 5 dot lines on the printhead The dot line store will be 0 + 5 + 10 . . . 50 + 55 = 330 half dotlines + 48 extra half dot lines (4 per dot row) + 60 extra half dotlines estimated to account for printhead misalignment = 438 half dotlines. 438 half dot lines of 6912 dots = 369.6 Kbytes PCU Program code 8Kbytes 1024 commands of 64 bits = 8 KB CPU 64 Kbytes Program code anddata TOTAL 2620 Kbytes (12 Kbyte TFS storage)

[2113] 20.4 SoPEC Memory Access Patterns

[2114] Table 107 shows a summary of the blocks on SoPEC requiring accessto the embedded DRAM and their individual memory access patterns. Mostblocks will access the DRAM in single 256-bit accesses. All accessesmust be padded to 256-bits except for 64-bit CDU write accesses and CPUwrite accesses. Bits which should not be written are masked using theindividual DRAM bit write inputs or byte write inputs, depending on thefoundry. Using single 256-bit accesses means that the buffering requiredin the SoPEC DRAM requesters will be minimized. TABLE 107 Memory accesspatterns of SoPEC DRAM Requesters DRAM requester Direction Memory accesspattern CPU R Single 256-bit reads. W Single 32-bit, 16-bit or 8-bitwrites. SCB R Single 256-bit reads. W Single 256-bit writes, with byteenables. CDU R Single 256-bit reads of the compressed contone data. WEach CDU access is a write to 4 consecutive DRAM words in the same rowbut only 64 bits of each word are written with the remaining bits writemasked. The access time for this 4 word page mode burst is 3 + 2 + 2 + 2= 9 cycles if the page mode select signal is clocked at 160 MHz. CFU RSingle 256 bit reads. LBD R Single 256 bit reads. SFU R Separate single256 bit reads for previous and current line but sharing the same DIUinterface W Single 256 bit writes. TE(TD) R Single 256 bit reads. Eachread returns 2 times 128 bit tags. TE(TFS) R Single 256 bit reads. TFSis 136 bytes. This means there is unused data in the fifth 256 bit read.A total of 5 reads is required. HCU R Single 256 bit reads. 128 × 128dither matrix requires 4 reads per line with double buffering. 256 × 256dither matrix requires 8 reads at the end of the line with singlebuffering. DNC R Single 256 bit dead nozzle table reads. Each deadnozzle table read con tains 16 dead-nozzle tables entries each of 10delta bits plus 6 dead nozzle mask bits. DWU W Single 256 bit writessince enable/disable DRAM access per color plane. LLU R Single 256 bitreads since enable/disable DRAM access per color plane. PCU R Single 256bit reads. Each PCU command is 64 bits so each 256 bit word can contain4 PCU commands. PCU reads from DRAM used for reprogramming PEP should beexecuted with minimum latency. If this occurs between pages then therewill be free bandwidth as most of the other SoPEC Units will not berequesting from DRAM. If this occurs between bands then the LDB, CDU andTE bandwidth will be free. So the PCU should have a high priority toaccess to any spare bandwidth. Refresh Single refresh.

[2115] 20.5 Buffering Required in SoPEC DRAM Requesters

[2116] If each DIU access is a single 256-bit access then we need toprovide a 256-bit double buffer in the DRAM requester. If the DRAMrequester has a 64-bit interface then this can be implemented as an8×64-bit FIFO. TABLE 108 Buffer sizes in SoPEC DRAM requesters DRAMBuffering required in Requester Direction Access patterns block CPU RSingle 256-bit reads. Cache. W Single 32-bit writes but allowing 16-bitor None. byte addressable writes. SCB R Single 256-bit reads. Double256-bit buffer. W Single 256-bit writes, with byte enables. Double256-bit buffer. CDU R Single 256-bit reads of the compressed Double256-bit buffer. contone data. W Each CDU access is a write to 4 Doublehalf JPEG block consecutive DRAM words in the same buffer. row but only64 bits of each word are written with the remaining bits write masked.CFU R Single 256 bit reads. Triple 256-bit buffer. LBD R Single 256 bitreads. Double 256-bit buffer. SFU R Separate single 256 bit reads forDouble 256-bit buffer for previous and cur rent line but sharing eachread channel. the same DIU interface W Single 256 bit writes. Double256-bit buffer. TE(TD) R Single 256 bit reads. Double 256-bit buffer.TE(TFS) R Single 256 bit reads. TFS is 136 bytes. Double line-buffer forThis means there is unused data in the 136 bytes implemented fifth 256bit read. A total of 5 reads is in TE. required. HCU R Single 256 bitreads. 128 × 128 dither Configurable between matrix requires 4 reads perline with double 128 byte buffer double buffering. 256 × 256 dithermatrix and single 256 byte buffer. requires 8 reads at the end of theline with single buffering. DNC R Single 256 bit reads Double 256-bitbuffer. Deeper buffering could be specified to cope with local clustersof dead nozzles. DWU W Single 256 bit writes per enabled Double 256-bitbuffer per odd/even color plane. color plane. LLU R Single 256 bit readsper enabled Double 256-bit buffer per odd/even color plane. color plane.PCU R Single 256 bit reads. Each PCU Single 256-bit buffer. command is64 bits so each 256 bit DRAM read can contain 4 PCU com mands. Requestedcommand is read from DRAM together with the next 3 contiguous 64-bitswhich are cached to avoid unnecessary DRAM reads. Refresh Singlerefresh. None.

[2117] 20.6 SoPEC DIU Bandwidth Requirements TABLE 109 SoPEC DIUBandwidth Requirements Number of cycles between Peak each Bandwidth256-bit DRAM which must be Average Example number of access to meetsupplied Bandwidth allocated Block Name Direction peak bandwidth(bits/cycle) (bits/cycle) timeslots¹ CPU R W SCB R W 3482 0.734 0.3933 1CDU R 128 (SF = 4), 288 64/n2 (SF = n), 32/10*n2 (SF = n), 1 (SF = 6)(SF = 6), 1:1 1.8 (SF = 6), 0.09 (SF = 6), 2 (SF = 4) compression 4 4(SF = 4) 0.2 (SF = 4) (1:1 (10:1 compression) compression) 5 W Forindividual 64/n2 (SF = n), 32/n2 (SF = n) 7, 2 (SF = 6) 8 accesses: 161.8 (SF = 6), 0.9 (SF = 6), 4 (SF = 4) cycles (SF = 4), 36 4 (SF = 4) 2(SF = 4) cycles (SF = 6), n2 cycles (SF = n). Will be implemented as apage mode burst of 4 accesses every 64 cycles (SF = 4), 144 (SF = 6),4*n2 (SF = n) cycles 6 CFU R 32 (SF = 4), 48 32/n (SF = n), 32/n (SF =n), 6 (SF = 6) (SF = 6) 9 5.4 (SF = 6), 5.4 (SF = 6), 8 (SF = 4) 8 (SF =4) 8 (SF = 4) LBD R 256 (1:1 1 (1:1 0.1 (10:1 1 compression) 10compression) compression) 11 SFU R 12812 2 2 2 W 25613 1 1 1 TE(TD) R25214 1.02 1.02 1 TE(TFS) R 5 reads per line 15 0.093 0.093 0 HCU R 4reads per line for 0.074 0.074 0 128 × 128 dither matrix 16 DNC R 106(5% dead- 2.4 (clump of 0.8 (equally 3 nozzles 10-bit delta deadnozzles) spaced dead encoded) 17 nozzles) DWU W 6 writes every 6 6 625618 LLU R 8 reads every 8 6 8 25619 PCU R 25620 1 1 1 Refresh 100212.56 2.56 3 (effective) TOTAL SF = 6:34.9 SF = 6:27.5 SF = 6:36 SF =4:41.9 SF = 4:31.2 excluding CPU. excluding CPU excluding CPU SF = 4:41excluding CPU

[2118] Since 64 valid bits are written per 256-bit write (Figure npage379 on page Error! Bookmark n t defined.) then the DRAM is accessedevery SF² cycles i.e. at SF4 an access every 16 cycles, at SF6 an accessevery 36 cycles.

[2119] If a page mode burst of 4 accesses is used then each access takes(3+2+2+2) equals 9 cycles. This means at SF, a set of 4 back-to-backaccesses must occur every 4*SF² cycles. This assumes the page modeselect signal is clocked at 160 MHz. CDU timeslots therefore take 9cycles.

[2120] For scale factors lower than 4 double buffering will be used.

[2121] 7: The peak bandwidth is twice the average bandwidth in the caseof 1.5 buffering.

[2122] 8: Each CDU(W) burst takes 9 cycles instead of 4 cycles for otheraccesses so CDU timeslots are longer.

[2123] 9: 4 color pixel (32 bits) read by CFU every SF cycles. At SF4,32 bits is required every 4 cycles or 256 bits every 32 cycles. At SF6,32bits every 6 cycles or 256 bits every 48 cycles.

[2124] 10: At 1:1 compression require 1 bit/cycle or 256 bits every 256cycles.

[2125] 11: The average bandwidth required at 10:1 compression is 0.1bits/cycle.

[2126] 12: Two separate reads of 1 bit/cycle.

[2127] 13: Write at 1 bit/cycle.

[2128] 14: Each tag can be consumed in at most 126 dot cycles andrequires 128 bits. This is a maximum rate of 256 bits every 252 cycles.

[2129] 15: 17×64 bit reads per line in PEC1 is 5×256 bit reads per linein SoPEC. Double-line buffered storage.

[2130] 16: 128 bytes read per line is 4×256 bit reads per line.Double-line buffered storage.

[2131] 17: 5% dead nozzles 10-bit delta encoded stored with 6-bit deadnozzle mask requires 0.8 bits/cycle read access or a 256-bit accessevery 320 cycles. This assumes the dead nozzles are evenly spaced out.In practice dead nozzles are likely to be clumped. Peak bandwidth isestimated as 3 times average bandwidth.

[2132] 18: 6 bits/cycle requires 6×256 bit writes every 256 cycles.

[2133] 19: 6 bits/160 MHz SoPEC cycle average but will peak at 2×6 bitsper 106 MHz print head cycle or 8 bits/ SoPEC cycle. The PHI canequalise the DRAM access rate over the line so that the peak rate equalsthe average rate of 6 bits/cycle. The print head is clocked at aneffective speed of 106 MHz.

[2134] 20: Assume one 256 read per 256 cycles is sufficient i.e. maximumlatency of 256 cycles per access is allowable.

[2135] 21: Refresh must occur every 3.2 ms. Refresh occurs row at a timeover 5120 rows of 2 parallel 10 Mbit instances. Refresh must occur every100 cycles. Each refresh takes 3 cycles.

[2136] 20.7 DIU BUS Topology

[2137] 20.7.1 Basic Topology TABLE 110 SoPEC DIU Requesters Read WriteOther CPU CPU Refresh SCB SCB CDU CDU CFU SFU LBD DWU SFU TE(TD) TE(TFS)HCU DNC LLU PCU

[2138] Table 110 shows the DIU requesters in SoPEC. There are 12 readrequesters and 5 write requesters in SoPEC as compared with 8 readrequesters and 4 write requesters in PEC1. Refresh is an additionalrequester.

[2139] In PEC1, the interface between the DIU and the DIU requesters hadthe following main features:

[2140] separate control and address signals per DIU requestermultiplexed in the DIU according to the arbitration scheme,

[2141] separate 64-bit write data bus for each DRAM write requestermultiplexed in the DIU,

[2142] common 64-bit read bus from the DIU with separate enables to eachDIU read requester.

[2143] Timing closure for this bussing scheme was straight-forward inPEC1. This suggests that a similar scheme will also achieve timingclosure in SoPEC. SoPEC has 5 more DRAM requesters but it will be in a0.13 um process with more metal layers and SoPEC will run atapproximately the same speed as PEC1.

[2144] Using 256-bit busses would match the data width of the embeddedDRAM but such large busses may result in an increase in size of the DIUand the entire SoPEC chIP. The SoPEC requestors would require double256-bit wide buffers to match the 256-bit busses. These buffers, whichmust be implemented in flip-flops, are less area efficient than 8-deep64-bit wide register arrays which can be used with 64-bit busses. SoPECwill therefore use 64-bit data busses. Use of 256-bit busses wouldhowever simplify the DIU implementation as local buffering of 256-bitDRAM data would not be required within the DIU.

[2145] 20.7.1.1 CPU DRAM Access

[2146] The CPU is the only DIU requestor for which access latency iscritical. All DIU write requesters transfer write data to the DIU usingseparate point-to-point busses. The CPU will use the cpu_dataout[31:0]bus. CPU reads will not be over the shared 64-bit read bus. Instead, CPUreads will use a separate 256-bit read bus.

[2147] 20.7.2 Making More Efficient Use of DRAM Bandwidth

[2148] The embedded DRAM is 256-bits wide. The 4 cycles it takes totransfer the 256-bits over the 64-bit data busses of SoPEC means thateffectively each access will be at least 4 cycles long. It takes only 3cycles to actually do a 256-bit random DRAM access in the case of IBMDRAM.

[2149] 20.7.2.1 Common Read Bus

[2150] If we have a common read data bus, as in PEC1, then if we aredoing back to back read accesses the next DRAM read cannot start untilthe read data bus is free. So each DRAM read access can occur only every4 cycles. This is shown in FIG. 86 with the actual DRAM access taking 3cycles leaving 1 unused cycle per access.

[2151] 20.7.2.2 Interleaving CPU and Non-CPU Read Accesses

[2152] The CPU has a separate 256-bit read bus. All other read accessesare 256-bit accesses are over a shared 64-bit read bus. Interleaving CPUand non-CPU read accesses means the effective duration of an interleavedaccess timeslot is the DRAM access time (3 cycles) rather than 4 cycles.

[2153]FIG. 87 shows interleaved CPU and non-CPU read accesses.

[2154] 20.7.2.3 Interleaving Read and Write Accesses

[2155] Having separate write data busses means write accesses can beinterleaved with each other and with read accesses. So now the effectiveduration of an interleaved access timeslot is the DRAM access time (3cycles) rather than 4 cycles. Interleaving is achieved by ordering theDIU arbitration slot allocation appropriately.

[2156]FIG. 88 shows interleaved read and write accesses. FIG. 89 showsinterleaved write accesses.

[2157] 256-bit write data takes 4 cycles to transmit over 64-bit bussesso a 256-bit buffer is required in the DIU to gather the write data fromthe write requester. The exception is CPU write data which istransferred in a single cycle.

[2158]FIG. 89 shows multiple write accesses being interleaved to obtain3 cycle DRAM access. Since two write accesses can overlap two sets of256-bit write buffers and multiplexors to connect two write requestorssimultaneously to the DIU are required.

[2159] Write requestors only require approximately one third of thetotal non-CPU bandwidth. This means that a rule can be introduced suchthat non-CPU write requestors are not allocated adjacent timeslots. Thismeans that a single 256-bit write buffer and multiplexor to connect theone write requestor at a time to the DIU is all that is required.

[2160] Note that if the rule prohibiting back-to-back non-CPU writes isnot adhered to, then the second write slot of any attempted such pairwill be disregarded and re-allocated under the unused read round-robinscheme.

[2161] 20.7.3 Bus Widths Summary TABLE 111 SoPEC DIU Requesters Data BusWidth Read Bus access width Write Bus access width CPU  256 (separate)CPU 32 SCB 64 (shared) SCB 64 CDU 64 (shared) CDU 64 CFU 64 (shared) SFU64 LBD 64 (shared) DWU 64 SFU 64 (shared) TE(TD) 64 (shared) TE(TFS) 64(shared) HCU 64 (shared) DNC 64 (shared) LLU 64 (shared) PCU 64 (shared)

[2162] 20.7.4 Conclusions

[2163] Timeslots should be programmed to maximise interleaving of sharedread bus accesses with other accesses for 3 cycle DRAM access. Theinterleaving is achieved by ordering the DIU arbitration slot allocationappropriately. CPU arbitration has been designed to maximiseinterleaving with non-CPU requesters

[2164] 20.8 SoPEC DRAM Addressing Scheme

[2165] The embedded DRAM is composed of 256-bit words. However theCPU-subsystem may need to write individual bytes of DRAM. Therefore itwas decided to make the DIU byte addressable. 22 bits are required tobyte address 20 Mbit of DRAM.

[2166] Most blocks read or write 256 bit words of DRAM. Therefore onlythe top 17 bits i.e. bits 21 to 5 are required to address 256-bit wordaligned locations.

[2167] The exceptions are

[2168] CDU which can write 64-bits so only the top 19 address bits i.e.bits 21-3 are required.

[2169] CPU writes can be 8, 16 or 32-bits. The cpu_diu_wmask[1:0] pinsindicate whether to write 8, 16 or 32 bits.

[2170] All DIU accesses must be within the same 256-bit aligned DRAMword. The exception is the CDU write access which is a write of 64-bitsto each of 4 contiguous 256-bit DRAM words.

[2171] 20.8.1 Write Address Constaints Specific to the CDU

[2172] Note the following conditions which apply to the CDU writeaddress, due to the four masked page-mode writes which occur whenever aCDU write slot is arbitrated.

[2173] The CDU address presented to the DIU is cdu_diu_wadr[21:3].

[2174] Bits [4:3] indicate which 64-bit segment out of 256 bits shouldbe written in 4 successive masked page-mode writes.

[2175] Each 10-Mbit DRAM macro has an input address port of width[15:0]. Of these bits, [2:0] are the “page address”. Page-mode writes,where you just vary these LSBs (i.e. the “page” or column address), butkeep the rest of the address constant, are faster than random writes.This is taken advantage of for CDU writes.

[2176] To guarantee against trying to span a page boundary, the DIUtreats “cdu_diu_wadr[6:5]” as being fixed at “00”.

[2177] From cdu_diu_wadr[21:3], a initial address of cdu_diu_wadr[21:7],concatenated with “00”, is used as the starting location for the firstCDU write. This address is then auto-incremented a further three times.

[2178] 20.9 DIU Protocols

[2179] The DIU protocols are

[2180] Pipelined i.e. the following transaction is initiated while theprevious transfer is in progress.

[2181] Split transaction i.e. the transaction is split into independentaddress and data transfers.

[2182] 20.9.1 Read Protocol Except CPU

[2183] The SoPEC read requestors, except for the CPU, perform single256-bit read accesses with the read data being transferred from the DIUin 4 consecutive cycles over a shared 64-bit read bus, diu_data[63:0].The read address <unit>_diu_radr[21:5] is 256-bit aligned.

[2184] The read protocol is:

[2185] <unit>_diu_rreq is asserted along with a valid<unit>_diu_radr[21:5].

[2186] The DIU acknowledges the request with diu_<unit>_rack. Therequest should be deasserted. The minimum number of cycles between<unit>_diu_rreq being asserted and the DIU generating an diu_<unit>_rackstrobe is 2 cycles (1 cycle to register the request, 1 cycle to performthe arbitration—see Section 20.14.10).

[2187] The read data is returned on diu_data[63:0] and its validity isindicated by diu_<unit>_rvalid. The overall 256 bits of data aretransferred over four cycles in the order:[63:0]->[127:64]->[191:128]->[255:192].

[2188] When four diu_<unit>_rvalid pulses have been received then ifthere is a further request <unit>_diu_rreq should be asserted again.diu_<unit>_rvalid will be always be asserted by the DIU for fourconsecutive cycles. There is a fixed gap of 2 cycles betweendiu_<unit>_rack and the first diu_<unit>_rvalid pulse. For more detailon the timing of such reads and the implications for back-to-backsequences, see Section 20.14.10.

[2189] 20.9.2 Read Protocol for CPU

[2190] The CPU performs single 256-bit read accesses with the read databeing transferred from the DIU over a dedicated 256-bit read bus forDRAM data, dram_cpu_data[255.0]. The read address cpu_adr[21:5] is256-bit aligned.

[2191] The CPU DIU read protocol is:

[2192] cpu_diu_rreq is asserted along with a valid cpu_adr[21:5].

[2193] The DIU acknowledges the request with diu_cpu_rack. The requestshould be deasserted. The minimum number of cycles between cpu_diu_rreqbeing asserted and the DIU generating a cpu_diu_rack strobe is 1 cycle(1 cycle to perform the arbitration—see Section 20.14.10).

[2194] The read data is returned on dram_cpu_data[255:0] and itsvalidity is indicated by diu_cpu_rvalid.

[2195] When the diu_cpu_rvalid pulse has been received then if there isa further request cpu_diu_rreq should be asserted again. Thediu_cpu_rvalid pulse with a gap of 1 cycle after rack (1 cycle for theread data to be returned from the DRAM—see Section 20.14.10).

[2196] 20.9.3 Write Protocol Except CPU and CDU

[2197] The SoPEC write requestors, except for the CPU and CDU, performsingle 256-bit write accesses with the write data being transferred tothe DIU in 4 consecutive cycles over dedicated point-to-point 64-bitwrite data busses. The write address <unit>_diu_wadr[21:5] is 256-bitaligned.

[2198] The write protocol is:

[2199] <unit>_diu_wreq is asserted along with a valid<unit>_diu_wadr[21:5].

[2200] The DIU acknowledges the request with diu_<unit>_wack. Therequest should be deasserted. The minimum number of cycles between<unit>_diu_wreq being asserted and the DIU generating an diu_<unit>_wackstrobe is 2 cycles (1 cycle to register the request, 1 cycle to performthe arbitration—see Section 20.14.10).

[2201] In the clock cycles following diu_<unit>_wack the SoPEC Unitoutputs the 15<unit>_diu_data[63:0], asserting <unit>_diu_wvalid. Thefirst <unit>_diu_wvalid pulse can occur the clock cycle afterdiu_<unit>_wack. <unit>_diu_wvalid remains asserted for the following 3clock cycles. This allows for reading from an SRAM where new data isavailable in the clock cycle after the address has changed e.g. theaddress for the second 64-bits of write data is available the cycleafter diu_<unit>_wack meaning the second 64-bits of write data is afurther cycle later. The overall 256 bits of data is transferred overfour cycles in the order: [63:0]->[127:64]->[191:128]->[255:192].

[2202] Note that for SCB writes, each 64-bit quarter-word has an 8-bitbyte enable mask associated with it. A different mask is used with eachquarter-word. The 4 mask values are transferred along with theirassociated data, as shown in FIG. 92.

[2203] If four consecutive <unit>_diu_wvalid pulses are not provided bythe requester, then the arbitration logic will disregard the write andre-allocate the slot under the unused read round-robin scheme.

[2204] Once all the write data has been output then if there is afurther request <unit>_diu_wreq should be asserted again.

[2205] 20.9.4 CPU Write Protocol

[2206] The CPU performs single 128-bit writes to the DIU on a dedicatedwrite bus, cpu_diu_wdata[127:0]. There is an accompanying write mask,cpu_diu_wmask[15:0], consisting of 16 byte enables and the CPU alsosupplies a 128-bit aligned write address on cpu_diu_wadr[21:4]. Notethat writes are posted by the CPU to the DIU and stored in a 1-deep 35buffer. When the DAU subsequently arbitrates in favour of the CPU, thecontents of the buffer are written to DRAM.

[2207] The CPU write protocol, illustrated in FIG. 93, is as follows:—

[2208] The DIU signals to the CPU via diu_cpu_write_rdy that its writebuffer is empty and that the CPU may post a write whenever it wishes.

[2209] The CPU asserts cpu_diu_wdatavalid to enable a write into thebuffer and to confirm the validity of the write address, data and mask.

[2210] The DIU de-asserts diu_cpu_write_rdy in the following cycle toindicate that its buffer is full and that the posted write is pendingexecution.

[2211] When the CPU is next awarded a DRAM access by the DAU, thebuffer's contents are written to memory. The DIU re-assertsdiu_cpu_write_rdy once the write data has been captured by DRAM, namelyin the “MSN1” DCU state.

[2212] The CPU can then, if it wishes, asynchronously use the new valueof .diu_cpu_write_rdy to enable a new posted write in the same “MSN1”cycle.

[2213] 20.9.5 CDU Write Protocol

[2214] The CDU performs four 64-bit word writes to 4 contiguous 256-bitDRAM addresses with the first address specified by cdu_diu_wadr[21:3].The write address cdu_diu_wadr[21:5] is 256-bit aligned with bitscdu_diu_wadr[4:3] allowing the 64-bit word to be selected.

[2215] The write protocol is:

[2216] cdu_diu_wdata is asserted along with a valid cdu_diu_wadr[21:3].

[2217] The DIU acknowledges the request with diu_cdu_wack. The requestshould be deasserted. The minimum number of cycles between cdu_diu_wreqbeing asserted and the DIU generating an diu_cdu_wack strobe is 2 cycles(1 cycle to register the request, 1 cycle to perform the arbitration—seeSection 20.14.10).

[2218] In the clock cycles following diu_cdu_wack the CDU outputs thecdu_diu_data[63:0], together with asserted cdu_diu_wvalid. The firstcdu_diu_wvalid pulse can occur the clock cycle after diu_cdu_wack.cdu_diu_wvalid remains asserted for the following 3 clock cycles. Thisallows for reading from an SRAM where new data is available in the clockcycle after the address has changed e.g. the address for the second64-bits of write data is available the cycle after diu_cdu_wack meaningthe second 64-bits of write data is a further cycle later. Data istransferred over the 4-cycle window in an order, such that eachsuccessive 64 bits will be written to a monotonically increasing (by 1location) 256-bit DRAM word.

[2219] If four consecutive cdu_diu_wvalid pulses are not provided withthe data, then the arbitration logic will disregard the write andre-allocate the slot under the unused read round-robin scheme.

[2220] Once all the write data has been output then if there is afurther request cdu_diu_wreq should be asserted again.

[2221] 20.10 DIU Arbitration Mechanism

[2222] The DIU will arbitrate access to the embedded DRAM. Thearbitration scheme is outlined in the next sections.

[2223] 20.10.1 Timeslot Based Arbitration Scheme

[2224] Table summarised the bandwidth requirements of the SoPECrequestors to DRAM. If we allocate the DIU requestors in terms of peakbandwidth then we require 35.25 bits/cycle (at SF=6) and 40.75bits/cycle (at SF=4) for all the requestors except the CPU.

[2225] A timeslot scheme is defined with 64 main timeslots. The numberof used main timeslots is programmable between 1 and 64.

[2226] Since DRAM read requesters, except for the CPU, are connected tothe DIU via a 64-bit data bus each 256-bit DRAM access requires 4 pclkcycles to transfer the read data over the shared read bus. The timeslotrotation period for 64 timeslots each of 4 pclk cycles is 256 pclkcycles or 1.6 μs, assuming pclk is 160 MHz. Each timeslot represents a256-bit access every 256 pclk cycles or 1 bit/cycle. This is thegranularity of the majority of DIU requesters bandwidth requirements inTable.

[2227] The SoPEC DIU requesters can be represented using 4 bits (Table npage 288 on page 268). Using 64 timeslots means that to allocate eachtimeslot to a requester, a total of 64×5-bit configuration registers arerequired for the 64 main timeslots.

[2228] Timeslot based arbitration works by having a pointer point to thecurrent timeslot. When re-arbitration is signaled the arbitration winneris the current timeslot and the pointer advances to the next timeslot.Each timeslot denotes a single access. The duration of the timeslotdepends on the access.

[2229] Note that advancement through the timeslot rotation is dependenton an enable bit, RotationSync, being set. The consequences of clearingand setting this bit are described in section 20.14.12.2.1 on page 295.

[2230] If the SoPEC Unit assigned to the current timeslot is notrequesting then the unused timeslot arbitration mechanism outlined inSection 20.10.6 is used to select the arbitration winner.

[2231] Note that there is always an arbitration winner for every slot.This is because the unused read re-allocation scheme includes refresh inits round-robin protocol. If all other blocks are not requesting, anearly refresh will act as fall-back for the slot.

[2232] 20.10.2 Separate Read and Write Arbitration Windows

[2233] For write accesses, except the CPU, 256-bits of write data aretransferred from the SoPEC DIU write requestors over 64-bit write bussesin 4 clock cycles. This write data transfer latency means that writesaccesses, except for CPU writes and also the CDU, must be arbitrated 4cycles in advance. (The CDU is an exception because CDU writes can startonce the first 64-bits of write data have been transferred since each64-bits is associated with a write to a different 256-bit word).

[2234] Since write arbitration must occur 4 cycles in advance, and theminimum duration of a timeslot duration is 3 cycles, the arbitrationrules must be modified to initiate write accesses in advance.Accordingly, there is a write timeslot lookahead pointer shown in FIG.96 two timeslots in advance of the current timeslot pointer.

[2235] The following examples illustrate separate read and writetimeslot arbitration with no adjacent write timeslots. (Recall rule onadjacent write timeslots introduced in Section 20.7.2.3 on page 238.)

[2236] In FIG. 97 writes are arbitrated two timeslots in advance. Readsare arbitrated in the same timeslot as they are issued. Writes can bearbitrated in the same timeslot as a read. During arbitration thecommand address of the arbitrated SoPEC Unit is captured.

[2237] Other examples are shown in FIG. 98 and FIG. 99. The actualtimeslot order is always the same as the programmed timeslot order i.e.out of order accesses do not occur and data coherency is never an issue.

[2238] Each write must always incur a latency of two timeslots.

[2239] Startup latency may vary depending on the position of the firstwrite timeslot. This startup latency is not important.

[2240] Table 112 shows the 4 scenarIOs depending on whether the currenttimeslot and write timeslot lookahead pointers point to read or writeaccesses. TABLE 112 Arbitration with separate windows for read and writeaccesses current write timeslot timeslot lookahead pointer pointeractions Read write Initiate DRAM read, Initiate write arbitration Read1read2 Initiate DRAM read1. Write1 write2 Initiate write2 arbitration.Execute DRAM write1. Write read Execute DRAM write.

[2241] If the current timeslot pointer points to a read access then thiswill be initiated immediately. If the write timeslot lookahead pointerpoints to a write access then this access is arbitrated immediately, orimmediately after the read access associated with the current timeslotpointer is initiated.

[2242] When a write access is arbitrated the DIU will capture the writeaddress. When the current timeslot pointer advances to the writetimeslot then the actual DRAM access will be initiated. Writes willtherefore be arbitrated 2 timeslots in advance of the DRAM writeoccurring.

[2243] At initialisation, the write lookahead pointer points to thefirst timeslot. The current timeslot pointer is invalid until the writelookahead pointer advances to the third timeslot when the currenttimeslot pointer will point to the first timeslot. Then both pointersadvance in tandem.

[2244] CPU write accesses are excepted from the lookahead mechanism.

[2245] If the selected SoPEC Unit is not requesting then there will beseparate read and write selection for unused timeslots. This isdescribed in Section 20.10.6.

[2246] 20.10.3 Arbitration of CPU Accesses

[2247] What distinguishes the CPU from other SoPEC requesters, is thatthe CPU requires minimum latency DRAM access i.e. preferably the CPUshould get the next available timeslot whenever it requests.

[2248] The minimum CPU read access latency is estimated in Table 113.This is the time between the CPU making a request to the DIU andreceiving the read data back from the DIU. TABLE 113 Estimated CPU readaccess latency ignoring caching CPU read access latency Duration CPUcache miss 1 cycle CPU MMU logic issues request and 1 cycle DIUarbitration completes Transfer the read address to the DRAM 1 cycle DRAMread latency 1 cycle Register the read data in CPU bridge 1 cycleRegister the read data in CPU 1 cycle CPU cache miss 1 cycle CPU MMUlogic issues request and 1 cycle DIU arbitration completes TOTAL gapbetween requests 6 cycles

[2249] If the CPU, as is likely, requests DRAM access again immediatelyafter receiving data from the DIU then the CPU could access every secondtimeslot if the access latency is 6 cycles. This assumes thatinterleaving is employed so that timeslots last 3 cycles. If the CPUaccess latency were 7 cycles, then the CPU would only be able to accessevery third timeslot.

[2250] If a cache hit occurs the CPU does not require DRAM access. Forits next DIU access it will have to wait for its next assigned DIU slot.Cache hits therefore will reduce the number of DRAM accesses but notspeed up any of those accesses.

[2251] To avoid the CPU having to wait for its next timeslot it isdesirable to have a mechanism for ensuring that the CPU always gets thenext available timeslot without incurring any latency on the non-CPUtimeslots.

[2252] This can be done by defining each timeslot as consisting of a CPUaccess preceding a non-CPU access. Each timeslot will last 6 cycles i.e.a CPU access of 3 cycles and a non-CPU access of 3 cycles. This isexactly the interleaving behaviour outlined in Section 20.7.2.2. If theCPU does not require an access, the timeslot will take 3 or 4 and thetimeslot rotation will go faster. A summary is given in Table 114. TABLE114 Timeslot access times. Access Duration Explanation CPU access + 3 +3 = 6 Interleaved access non-CPU access cycles non-CPU access 4 cyclesAccess and preceding access both to shared read bus non-CPU access 3cycles Access and preceding access not both to shared read bus CDU writeaccess 3 + 2 + 2 + Page mode select signal 2 = 9 cycles is clocked at160 MHz

[2253] CDU write accesses require 9 cycles. CDU write accesses precededby a CPU access require 12 cycles. CDU timeslots therefore take longerthan all other DIU requestors timeslots.

[2254] With a 256 cycle rotation there can be 42 accesses of 6 cycles.

[2255] For low scale factor applications, it is desirable to have moretimeslots available in the same 256 5 cycle rotation. So two counters of4-bits each are defined allowing the CPU to get a maximum of(CPUPreAccessTimeslots+1) pre-accesses for every (CPUTotalTimeslots+1)main slots. A timeslot counter starts at CPUTotalTimeslots anddecrements every timeslot, while another counter starts atCPUPreAccessTimeslots and decrements every timeslot in which the CPUuses its access. When the CPU pre-access counter goes to zero beforeCPUTotalTimeslots, no further CPU accesses are allowed. When theCPUTotalTimeslots counter reaches zero both counters are reset to theirrespective initial values.

[2256] The CPU is not included in the list of SoPEC DIU requesters,Table, for the main timeslot allocations. The CPU cannot therefore beallocated main timeslots. It relies on pre-accesses in advance of suchslots as the sole method for DRAM transfers.

[2257] CPU access to DRAM can never be fully disabled, since to do sowould render SoPEC inoperable. Therefore the CPUPreAccessTimeslots andCPUTotalTimeslots register values are interpreted as follows: In eachsucceeding window of (CPUTotalTimeslots+1) slots, the maximum quota ofCPU pre-accesses allowed is (CPUPreAccessTimeslots+1). The “+1”implementations mean that the CPU quota cannot be made zero.

[2258] The various modes of operation are summarised in Table 115 with anominal rotation period of 256 cycles. TABLE 115 CPU timeslot allocationmodes with nominal rotation period of 256 cycles Nominal Timeslot Numberof Access Type duration timeslots Notes CPU Pre-access 6 42 Each accessis i.e. cycles timeslots CPU + non-CPU. CPUPreAccessTimeslots = If CPUdoes not CPUTotalTimeslots use a timeslot then rotation is faster.Fractional CPU 4 or 6 42-64 Each CPU + non- Pre-access i.e. cyclestimeslots CPU access CPUPreAccessTimeslots < requires a 6CPUTotalTimeslots cycle timeslot. Individual non- CPU timeslots take 4cycles if current access and pre- ceding access are both to shared readbus. Individual non- CPU timeslots take 3 cycles if current ac- cess andpre- ceding access are not both to shared read bus.

[2259] 20.10.4 CDU Accesses

[2260] As indicated in Section 20.10.3, CDU write accesses require 9cycles. CDU write accesses preceded by a CPU access require 12 cycles.CDU timeslots therefore take longer than all other DIU requestorstimeslots. This means that when a write timeslot is unused it cannot bere-allocated to a CDU write as CDU accesses take 9 cycles. The writeaccesses which the CDU write could otherwise replace require only 3 or 4cycles.

[2261] Unused CDU write accesses can be replaced by any other writeaccess according to 20.10.6.1 Unused write timeslots allocation on page247.

[2262] 20.10.5 Refresh Controller

[2263] Refresh is not included in the list of SoPEC DIU requesters,Table, for the main timeslot allocations. Timeslots cannot therefore beallocated to refresh.

[2264] The DRAM must be refreshed every 3.2 ms. Refresh occurs row at atime over 5120 rows of 2 parallel 10 Mbit instances. A refresh operationmust therefore occur every 100 cycles. The refresh_period register has adefault value of 99. Each refresh takes 3 cycles.

[2265] A refresh counter will count down the number of cycles betweeneach refresh. When the down-counter reaches 0, the refresh controllerwill issue a refresh request and the down-counter is reloaded with thevalue in refresh_period and the count-down resumes immediately.Allocation of main slots must take into account that a refresh isrequired at least once every 100 cycles. Refresh is included in theunused read and write timeslot allocation. If unused timeslot allocationresults in refresh occurring early by N cycles, then the refresh counterwill have counted down to N. In this case, the refresh counter is resetto refresh_period and the count-down recommences. Refresh can bepreceded by a CPU access in the same way as any other access. This iscontrolled by the CPUPreAccessTimeslots and CPUTotalTimeslotsconfiguration registers. Refresh will therefore not affect CPUperformance. A sequence of accesses including refresh might therefore beCPU, refresh, CPU, actual timeslot.

[2266] 20.10.6 Allocating Unused Timeslots

[2267] Unused slots are re-allocated separately depending on whether theunused access was a read access or a write access. This is best-efforttraffic. Only unused non-CPU accesses are re-allocated.

[2268] 20.10.6.1 Unused Write Timeslots Allocation

[2269] Unused write timeslots are re-allocated according to a fixedpriority order shown in Table 116. TABLE 116 Unused write timeslotpriority order Priority Name Order SCB(W) 1 SFU(W) 2 DWU 3 Unused readtimeslot allocation 4

[2270] CDU write accesses cannot be included in the unused timeslotallocation for write as CDU accesses take 9 cycles. The write accesseswhich the CDU write could otherwise replace require only 3 or 4 cycles.

[2271] Unused write timeslot allocation occurs two timeslots in advanceas noted in Section 20.10.2. If the units at priorities 1-3 are notrequesting then the timeslot is re-allocated according to the unusedread timeslot allocation scheme described in Section 20.10.6.2. However,the unused read timeslot allocation will occur when the current timeslotpointer of FIG. 96 reaches the timeslot i.e. it will not occur inadvance.

[2272] 20.10.6.2 Unused Read Timeslots Allocation

[2273] Unused read timeslots are re-allocated according to a two levelround-robin scheme. The SoPEC Units included in read timeslotre-allocation is shown in Table 117. TABLE 117 Unused read timeslotallocation Name SCB(R) CDU(R) CFU LBD SFU(R) TE(TD) TE(TFS) HCU DNC LLUPCU CPU/Refresh

[2274] Each SoPEC requestor has an associated bit, ReadRoundRobinLevel,which indicates whether it is in level 1 or level 2 round-robin. TABLE118 Read round-robin level selection Level Action ReadRoundRobinLevel =0 Level 1 ReadRoundRobinLevel = 1 Level 2

[2275] A pointer points to the most recent winner on each of theround-robin levels. Re-allocation is carried out by traversing level 1requesters, starting with the one immediately succeeding the last level1 winner. If a requesting unit is found, then it wins arbitration andthe level 1 pointer is shifted to its position. If no level 1 unit wantsthe slot, then level 2 is similarly examined and its pointer adjusted.

[2276] Since refresh occupies a (shared) position on one of the twolevels and continually requests access, there will always be someround-robin winner for any unused slot.

[2277] 20.10.6.2.1 Shared CPU/Refresh Round-Robin Position

[2278] Note that the CPU can conditionally be allowed to take part inthe unused read round-robin scheme. Its participation is controlled viathe configuration bit EnableCPURoundRobin. When this bit is set, the CPUand refresh share a joint position in the round-robin order, shown inTable. When cleared, the position is occupied by refresh alone.

[2279] If the shared position is next in line to be awarded an unusednon-CPU read/write slot, then the CPU will have first option on theslot. Only if the CPU doesn't want the access, will it be granted torefresh. If the CPU is excluded from the round robin, then any awards tothe position benefit refresh.

[2280] 20.11 Guidelines for Programming the DIU

[2281] Some guidelines for programming the DIU arbitration scheme aregiven in this section together with an example.

[2282] 20.11.1 Circuit Latency

[2283] Circuit latency is a fixed service delay which is incurred, asand from the acceptance by the DIU arbitration logic of a block'spending read/write request. It is due to the processing time of therequest, readying the data, plus the DRAM access time. Latencies differfor read and write requests. See Tables 79 and 80 for respectivebreakdowns.

[2284] If a requesting block is currently stalled, then the longest timeit will have to wait between issuing a new request for data and actuallyreceiving it would be its timeslot period, plus the circuit latencyoverhead, along with any intervening non-standard slot durations, suchas refresh and CDU(W). In any case, a stalled block will always incurthis latency as an additional overhead, when coming out of a stall.

[2285] In the case where a block starts up or unstalls, it will startprocessing newly-received data at a time beyond its serviced timeslotequivalent to the circuit latency. If the block's timeslots are evenlyspaced apart in time to match its processing rate, (in the hope ofminimising stalls,) then the earliest that the block could restall, ifnot re-serviced by the DIU, would be the same latency delay beyond itsnext timeslot occurrence. Put another way, the latency incurred atstart-up pushes the potential DIU-induced stall point out by the samefixed delta beyond each successive timeslot allocated to the block. Thisassumes that a block re-requests access well in advance of its upcomingtimeslots. Thus, for a given stall-free run of operation, the circuitlatency overhead is only incurred inititially when unstalling.

[2286] While a block can be stalled as a result of how quickly the DIUservices its DRAM requests, it is also prone to stalls caused by itsupstream or downstream neighbours being able to supply or consume datawhich is transferred between the blocks directly, (as opposed to via theDIU). Such neighbour-induced stalls, often occurring at events like endof line, will have the effect that a block's DIU read buffer will tendto fill, as the block stops processing read data. Its DIU write bufferwill also tend to fill, unable to despatch to DRAM until the downstreamblock frees up shared-access DRAM locations. This scenario isbeneficial, in that when a block unstalls as a result of its neighbourreleasing it, then that block's read/write DIU buffers will have a fillstate less likely to stall it a second time, as a result of DIU servicedelays.

[2287] A block's slots should be scheduled with a service guarantee inmind. This is dictated by the block's processing rate and hence,required access to the DRAM. The rate is expressed in terms of bits percycle across a processing window, which is typically (though not always)256 cycles. Slots should be evenly interspersed in this window (or“rotation”) so that the DIU can fulfill the block's service needs.

[2288] The following ground rules apply in calculating the distributionof slots for a given non-CPU block:—

[2289] The block can, at maximum, suffer a stall once in the rotation,(i.e. unstall and restall) and hence incur the circuit latency describedabove.

[2290] This rule is, by definition, always fulfilled by those blockswhich have a service requirement of only

[2291] 1 bit/cycle (equivalent to 1 slot/rotation) or fewer. It can beshown that the rule is also satisfied by those blocks requiring morethan 1 bit/cycle. See Section 20.12.1 Slot Distributions and StallCalculations for Individual Blocks, on page 255.

[2292] Within the rotation, certain slots will be unavailable, due totheir being used for refresh. (See Section 20.11.2 Refresh latencies)

[2293] In programming the rotation, account must be taken of the factthat any CDU(W) accesses will consume an extra 6 cycles/access, over andabove the norm, in CPU pre-access mode, or 5 cycles/access withoutpre-access.

[2294] The total delay overhead due to latency, refreshes and CDU(W) canbe factored into the service guarantee for all blocks in the rotation bydeleting once, (i.e. reducing the rotation window,) that number of slotswhich equates to the cumulative duration of these various anomalies.

[2295] The use of lower scale factors will imply a more frequent demandfor slots by non-CPU blocks. The percentage of slots in the overallrotation which can therefore be designated as CPU pre-access ones shouldbe calculated last, based on what can be accommodated in the light ofthe non-CPU slot need.

[2296] Read latency is summarised below in Table 119. TABLE 119 Readlatency Non-CPU read access latency Duration non-CPU read requestorinternally 1 cycle generates DIU request register the non- CPU readrequest 1 cycle complete the arbitration of the request 1 cycle transferthe read address to the DRAM 1 cycle DRAM read latency 1 cycle registerthe DRAM read data in DIU 1 cycle register the 1st 64-bits of read datain 1 cycle requester register the 2nd 64-bits of read data in 1 cyclerequester register the 3rd 64-bits of read data in 1 cycle requesterregister the 4th 64-bits of read data in 1 cycle requester TOTAL 10cycles

[2297] Write latency is summarised in Table 120. TABLE 120 Write latencyNon-CPU write access latency Duration non-CPU write requestor internallygenerates 1 cycle DIU request register the non-CPU write request 1 cyclecomplete the arbitration of the request 1 cycle transfer the acknowledgeto the write requester 1 cycle transfer the 1st 64 bits of write data tothe 1 cycle DIU transfer the 2nd 64 bits of write data to the 1 cycleDIU transfer the 3rd 64 bits of write data to the 1 cycle DIU transferthe 4th 64 bits of write data to the 1 cycle DIU Write to DRAM withlocally registered write 1 cycle data TOTAL 9 cycles

[2298] Timeslots removed to allow for read latency will also cover writelatency, since the former is the larger of the two.

[2299] 20.11.2 Refresh Latencies

[2300] The number of allocated timeslots for each requester needs totake into account that a refresh must occur every 100 cycles. This canbe achieved by deleting timeslots from the rotation since the number oftimeslots is made programmable.

[2301] Refresh is preceded by a CPU access in the same way as any otheraccess. This is controlled by the CPUPreAccessTimeslots andCPUTotalTimeslots configuration registers. Refresh will therefore notaffect CPU performance.

[2302] As an example, in CPU pre-access mode each timeslot will last 6cycles. If the timeslot rotation has 50 timeslots then the rotation willlast 300 cycles. The refresh controller will trigger a refresh every 100cycles. Up to 47 timeslots can be allocated to the rotation ignoringrefresh. Three timeslots deleted from the 50 timeslot rotation willallow for the latency of a refresh every 100 cycles.

[2303] 20.11.3 Ensuring Sufficient DNC and PCU Access

[2304] PCU command reads from DRAM are exceptional events and shouldcomplete in as short a time as possible. Similarly, we must ensure thereis sufficient free bandwidth for DNC accesses e.g. when clusters of deadnozzles occur. In Table DNC is allocated 3 times average bandwidth. PCUand DNC can also be allocated to the level 1 round-robin allocation forunused timeslots so that unused timeslot bandwidth is preferentiallyavailable to them.

[2305] 20.11.4 Basing Timeslot Allocation on Peak Bandwidths

[2306] Since the embedded DRAM provides sufficient bandwidth to use 1:1compression rates for the CDU and LBD, it is possible to simplify themain timeslot allocation by basing the allocation on peak bandwidths. Ascombined bi-level and tag bandwidth at 1:1 scaling is only 5 bits/cycle,we will usually only consider the contone scale factor as the variablein determining timeslot allocations.

[2307] If slot allocation is based on peak bandwidth requirements thenDRAM access will be guaranteed to all SoPEC requesters. If we do notallocate slots for peak bandwidth requirements then we can also allowfor the peaks deterministically by adding some cycles to the print linetime.

[2308] 20.11.5 Adjacent Timeslot Restrictions

[2309] 20.11.5.1 Non-CPU Write Adjacent Timeslot Restrictions

[2310] Non-CPU write requesters should not be assigned adjacenttimeslots as described in Section 20.7.2.3. This is because adjacenttimeslots assigned to non-CPU requestors would require two sets of256-bit write buffers and multiplexors to connect two write requestorssimultaneously to the DIU. Only one 256-bit write buffer and multiplexoris implemented. Recall from section 20.7.2.3 on page 238 that ifadjacent non-CPU writes are attempted, that the second write of any suchpair will be disregarded and re-allocated under the unused read scheme.

[2311] 20.11.5.2 Same DIU Requestor Adjacent Timeslot Restrictions

[2312] All DIU requesters have state-machines which request and transferthe read or write data before requesting again. From FIG. 90 readrequests have a minimum separation of 9 cycles. From FIG. 92 writerequests have a minimum separation of 7 cycles. Therefore adjacenttimeslots should not be assigned to a particular DIU requester becausethe requester will not be able to make use of all these slots.

[2313] In the case that a CPU access precedes a non-CPU access timeslotslast 6 cycles so write and read requesters can only make use of everysecond timeslot. In the case that timeslots are not preceded by CPUaccesses timeslots last 4 cycles so the same write requester can useevery second timeslot but the same read requestor can use only everythird timeslot. Some DIU requestors may introduce additional pipelinedelays before they can request again. Therefore timeslots should beseparated by more than the minimum to allow a margin.

[2314] 20.11.6 Line Margin

[2315] The SFU must output 1 bit/cycle to the HCU. Since HCUNumDots maynot be a multiple of 256 bits the last 256-bit DRAM word on the line cancontain extra zeros. In this case, the SFU may not be able to provide 1bit/cycle to the HCU. This could lead to a stall by the SFU. This stallcould then propagate if the margins being used by the HCU are notsufficient to hide it. The maximum stall can be estimated by thecalculation: DRAM service period—X scale factor * dots used from lastDRAM read for HCU line.

[2316] Similarly, if the line length is not a multiple of 256-bits thene.g. the LLU could read data from DRAM which contains padded zeros. Thiscould lead to a stall. This stall could then propagate if the pagemargins cannot hide it.

[2317] A single addition of 256 cycles to the line time will suffice forall DIU requesters to mask these stalls.

[2318] 20.12 Example Outline DIU Programming TABLE 121 Timeslotallocation based on peak bandwidth Peak Bandwidth which must be BlockDirec- supplied MainTimeslots Name tion (bits/cycle) allocated SCB R W0.734⁷ 1 CDU R 0.9 (SF = 6), 1 (SF = 6) 2 (SF = 4) 2 (SF = 4) W 1.8 (SF= 6),⁸ 2 (SF = 6) 4 (SF = 4) 4 (SF = 4) CFU R 5.4 (SF = 6), 6 (SF = 6) 8(SF = 4) 8 (SF = 4) LBD R 1 1 SFU R 2 2 W 1 1 TE(TD) R 1.02 1 TE(TFS) R0.093 0 HCU R 0.074 0 DNC R 2.4 3 DWU W 6 6 LLU R 8 8 PCU R 1 1 TOTAL 33(SF = 6) 38 (SF = 4)

[2319] Table 121 shows an allocation of main timeslots based on the peakbandwidths of Table The bandwidth required for each unit is calculatedallowing extra cycles for read and write circuit latency for each accessrequiring a bandwidth of more than 1 bit/cycle. Fractional bandwidth issupplied via unused read slots.

[2320] The timeslot rotation is 256 cycles. Timeslots are deleted fromthe rotation to allow for circuit latencies for accesses of up to 1 bitper cycle i.e. 1 timeslot per rotation.

EXAMPLE 1 Scale-Factor=6

[2321] Program the MainTimeslot configuration register (Table) for peakrequired bandwidths of SoPEC Units according to the scale factor.

[2322] Program the read round-robin allocation to share unused readslots. Allocate PCU, DNC, HCU and TFS to level 1 read round-robin.

[2323] Assume scale-factor of 6 and peak bandwidths from Table

[2324] Assign all DIU requestors except TE(TFS) and HCU to multiples of1 timeslot, as indicated in Table, where each timeslot is 1 bit/cycle.This requires 33 timeslots.

[2325] No timeslots are explicitly allocated for the fractionalbandwidth requirements of TE(TFS) and HCU accesses. Instead, these unitsare serviced via unused read slots.

[2326] Allow 3 timeslots to allow for 3 refreshes in the rotation.

[2327] Therefore, 36 scheduled slots are used in the rotation for maintimeslots and refreshes, some or all of which may be able to have a CPUpre-access, provided they fit in the rotation window.

[2328] Each of the 2 CDU(W) accesses requires 9 cycles. Per access, thisimplies an overhead of 1 slot (12 cycles instead of 6) in pre-accessmode, or 1.25 slots (9 cycles instead of 4) for no pre-access. Thecumulative overhead of the two accesses is either 2 slots (pre-access)or 3 slots (no pre-access).

[2329] Assuming all blocks require a service guarantee of no more than asingle stall across 256 bits, allow 10 cycles for read latency, whichalso takes care of 9-cycle write latency. This can be accounted for byreserving 2 six-cycle slots (CPU pre-access) or 3 four-cycle slots (nopre-access).

[2330] Assume a 256 cycle timeslot rotation.

[2331] CDU(W) and read latency reduce the number of available cycles ina rotation to: 256−2×6−2×6=232 cycles (CPU pre-access) or256−3×4−3×4=232 cycles (no pre-access).

[2332] As a result, 232 cycles available for 36 accesses implies eachaccess can take 232/36=6.44 cycles maximum. So, all accesses can have apre-access.

[2333] Therefore the CPU achieves a pre-access ratio of 36/36=100% ofslots in the rotation.

EXAMPLE 2 Scale-Factor=4

[2334] Program the MainTimeslot configuration register (Table) for peakrequired bandwidths of SoPEC Units according to the scale factor.Program the read round-robin allocation to share unused read slots.Allocate PCU, DNC, HCU and TFS to level 1 read round-robin.

[2335] Assume scale-factor of 4 and peak bandwidths from Table

[2336] Assign all DIU requestors except TE(TFS) and HCU multiples of 1timeslot, as indicated in Table, where each timeslot is 1 bit/cycle.This requires 38 timeslots.

[2337] No timeslots are explicitly allocated for the fractionalbandwidth requirements of TE(TFS) and HCU accesses. Instead, these unitsare serviced via unused read slots.

[2338] Allow 3 timeslots to allow for 3 refreshes in the rotation.

[2339] Therefore, 41 scheduled slots are used in the rotation for maintimeslots and refreshes, some or all of which can have a CPU pre-access,provided they fit in the rotation window.

[2340] Each of the 4 CDU(W) accesses requires 9 cycles. Per access, thisimplies an overhead of 1 slot (12 cycles instead of 6) for pre-accessmode, or 1.25 slots (9 cycles instead of 4) for no pre-access. Thecumulative overhead of the four accesses is either 4 slots (pre-access)or 5 slots (no pre-access).

[2341] Assuming all blocks require a service guarantee of no more than asingle stall across 256 bits, allow 10 cycles for read latency, whichalso takes care of 9-cycle write latency. This can be accounted for byreserving 2 six-cycle slots (CPU pre-access) or 3 four-cycle slots (nopre-access).

[2342] Assume a 256 cycle timeslot rotation.

[2343] CDU(W) and read latency reduce the number of available cycles ina rotation to: 256−4×6−2×6=220 cycles (CPU pre-access) or256−5×4−3×4=224 cycles (no pre-access).

[2344] As a result, between 220 and 224 cycles are available for 41accesses, which implies each access can take between 220/41=5.36 cyclesand 224/41=5.46 cycles.

[2345] Work out how many slots can have a pre-access: For the lowernumber of 220 cycles, this implies (41−n)*6+n*4<=220, where n=number ofslots with no pre-access cycle. Solving the equation gives n>=13. Checkanswer: 28*6+0.13*4=220.

[2346] So 28 slots out of the 41 in the rotation can have CPUpre-accesses.

[2347] The CPU thus achieves a pre-access ratio of 28/41=68.3% of slotsin the rotation.

[2348] 20.12.1 Slot Distributions and Stall Calculations for IndividualBlocks

[2349] The following sections show how the slots for blocks with aservice requirement greater than 1 bit/cycle should be distributed.Calculations are included to check that such blocks will not suffer morethan one stall per rotation.

[2350] 20.12.1.1 SFU

[2351] This has 2 bits/cycle on read but this is two separate channelsof 1 bit/cycle sharing the same DIU interface so it is effectively 2channels each of 1 bit/cycle so allowing the same margins as the LBDwill work.

[2352] 20.12.1.2 DWU

[2353] The DWU has 12 double buffers in each of the 6 colour planes, oddand even. These buffers are filled by the DNC and will request DIUaccess when double buffers fill. The DNC supplies 6 bits to the DWUevery cycle (6 odd in one cycle, 6 even in the next cycle). So theservice deadline is 512 cycles, given 6 accesses per 256-cycle rotation.

[2354] 20.12.1.3 CFU

[2355] Here the requirement is that the DIU stall should be less thanthe time taken for the CFU to consume one third of its triple buffer.The total DIU stall=refresh latency+extra CDU(W) latency+read circuitlatency=3+5 (for 4 cycle timeslots)+10=18 cycles. The CFU can consumeits data at 8 bits/cycle at SF=4. Therefore 256 bits of data will last32 cycles so the triple buffer is safe. In fact we only need an extra144 bits of buffering or 3×64 bits. But it is safer to have the fullextra 256 bits or 4×64 bits of buffering.

[2356] 20.12.1.4 LLU

[2357] The LLU has 2 channels, each of which could request at 6 bits/106MHz channel or 4 bits/160 MHz cycle, giving a total of 8 bits/160 MHzcycle. The service deadline for each channel is 256×106 MHz cycles, i.e.all 6 colours must be transferred in 256 cycles to feed the printhead.This equates to 384×160 MHz cycles.

[2358] Over a span of 384 cycles, there will be 6 CDU(W) accesses, 4refreshes and one read latency encountered at most. Assuming CPUpre-accesses for these occurrences, this means the number of availablecycles is given by 384−6×6−4×6−10=314 cycles.

[2359] For a CPU pre-access slot rate of 50%, 314 cycles implies 31 CPUand 63 non-CPU accesses (31×6+32×4=314). For 12 LLU accessesinterspersed amongst these 63 non-CPU slots, implies an LLU allocationrate of approximately one slot in 5.

[2360] If the CPU pre-access is 100% across all slots, then 314 cyclesgives 52 slots each to CPU and non-CPU accesses, (52×6=312 cycles).Twelve accesses spread over 52 slots, implies a 1-in-4 slot allocationto the LLU.

[2361] The same LLU slot allocation rate (1 slot in 5, or 1 in 4) can beapplied to programming slots across a 256-cycle rotation window. Thewindow size does not affect the occurrence of LLU slots, so the384-cycle service requirement will be fulfilled.

[2362] 20.12.1.5 DNC

[2363] This has a 2.4 bits/cycle bandwidth requirement. Each access willsee the DIU stall of 18 cycles. 2.4 bits/cycle corresponds to an accessevery 106 cycles within a 256 cycle rotation. So to allow for DIUlatency we need an access every 106-18 or 88 cycles. This is a bandwidthof 2.9 bits/cycle, requiring 3 timeslots in the rotation.

[2364] 20.12.1.6 CDU

[2365] The JPEG decoder produces 8 bits/cycle. Peak CDUR[ead] bandwidthis 4 bits/cycle (SF=4), peak CDUW[rite] bandwidth is 4 bits/cycle(SF=4). both with 1.5 DRAM buffering.

[2366] The CDU(R) does a DIU read every 64 cycles at scale factor 4 with1.5 DRAM buffering. The delay in being serviced by the DIU could be readcircuit latency (10)+refresh (3)+extra CDU(W) cycles (6)=19 cycles. TheJPEG decoder can consume each 256 bits of DIU-supplied data at 8bits/cycle, i.e. in 32 cycles. If the DIU is 19 cycles late (due tolatency) in supplying the read data then the JPEG decoder will havefinished processing the read data 32+19=49 cycles after the DIU access.This is 64-49=15 cycles in advance of the next read. This 15 cycles isthe upper limit on how much the DIU read service can further be delayed,without causing a stall. Given this margin, a stall on the read sidewill not occur.

[2367] On the write side, for scale factor 4, the access pattern is aDIU writes every 64 cycles with 1.5 DRAM buffereing. The JPEG decoderruns at 8 bits cycle and consumes 256 bits in 32 cycles. The CDU willnot stall if the JPEG decode time (32)+DIU stall (19)<64, which is true.

[2368] 20.13 CPU DRAM Access Performance

[2369] The CPU's share of the timeslots can be specified in terms ofguaranteed bandwidth and average bandwidth allocations.

[2370] The CPU's access rate to memory depends on the CPU read accesslatency i.e. the time between the CPU making a request to the DIU andreceiving the read data back from the DIU.

[2371] how often it can get access to DIU timeslots.

[2372] Table estimated the CPU read latency as 6 cycles.

[2373] How often the CPU can get access to DIU timeslots depends on theaccess type. This is summarised in Table 122. TABLE 122 CPU DRAM accessperformance Nominal Access Timeslot CPU DRAM Type Duration access rateNotes CPU Pre- 6 Lower bound CPU can access access cycles (guaranteedevery timeslot. bandwidth) is 160 MHz/ 6 = 26.27 MHz Frac- 4 or 6 Lowerbound CPU accesses precede tional cycles (guaranteed a fraction N oftime- CPU bandwidth) slots where N = C/T. Pre- is (160 C = access MHz *N/P) CPUPreAccessTimeslots T = CPUTotalTimeslots P = (6*C + 4*(T − C))/T

[2374] In both CPU Pre-access and Fractional CPU Pre-access modes, ifthe CPU is not requesting the timeslots will have a duration of 3 or 4cycles depending on whether the current access and preceding access areboth to the shared read bus. This will mean that the timeslot rotationwill run faster and more bandwidth is available.

[2375] If the CPU runs out of its instruction cache then instructionfetch performance is only limited by the on-chip bus protocol. If dataresides in the data cache then 160 MHz performance is achieved.Accessing memory mapped registers, PSS or ROM with a 3 cycle busprotocol (address cycle+data cycle) gives 53 MHz performance.

[2376] Due to the action of CPU caching, some bandwidth limiting of theCPU in Fractional CPU Pre-access mode is expected to have little or noimpact on the overall CPU performance.

[2377] 20.14 Implementation

[2378] The DRAM Interface Unit (DIU) is partitioned into 2 logicalblocks to facilitate design and verification.

[2379] a. The DRAM Arbitration Unit (DAU) which interfaces with theSoPEC DIU requesters.

[2380] b. The DRAM Controller Unit (DCU) which accesses the embeddedDRAM.

[2381] The basic principle in design of the DIU is to ensure that theeDRAM is accessed at its maximum rate while keeping the CPU read accesslatency as low as possible.

[2382] The DCU is designed to interface with single bank 20 Mbit IBMCu-11 embedded DRAM performing random accesses every 3 cycles. Page modeburst of 4 write accesses, associated with the CDU, are also supported.

[2383] The DAU is designed to support interleaved accesses allowing theDRAM to be accessed every 3 cycles where back-to-back accesses do notoccur over the shared 64-bit read data bus.

[2384] 20.14.1 DIU Partition

[2385] 20.14.2 Definition of DCU IO TABLE 123 DCU interface Port NamePins I/O Description Clocks and Resets pclk 1 In SoPEC Functional clockdau_dcu_(—) 1 In Active-low, synchronous reset in reset_n pclk domain.Incorporates DAU hard and soft resets. Inputs from DAU dau_dcu_(—) 1 InSignal indicating from DAU msn2stall Arbitration Logic which whenasserted stalls DCU in MSN2 state. dau_dcu_(—) 17 In Signal indicatingthe address adr[21:5] for the DRAM access. This is a 256-bit alignedDRAM address. dau_dcu_rwn 1 In Signal indicating the direction for theDRAM access (1 = read, 0 = write). dau_dcu_(—) 1 In Signal indicating ifaccess is cduwpage a CDU write page mode access (1 = CDU page mode, 0 =not CDU page mode). dau_dcu_refresh 1 In Signal indicating that arefresh command is to be issued. If asserted dau_dcu_adr, dau_dcu_rwnand dau_dcu_cduwpage are ignored. dau_dcu_wdata 256 In 256-bit writedata to DCU dau_dcu_wmask 32 In Byte encoded write data mask for 256-bitdau_dcu_wdata to DCU Polarity: A “1” in a bit field of dau_dcu_wmaskmeans that the corresponding byte in the 256-bit dau_dcu_wdata iswritten to DRAM. Outputs to DAU dcu_dau_adv 1 Out Signal indicating toDAU to supply next command to DCU dcu_dau_wadv 1 Out Signal indicatingto DAU to ini- tiate next non-CPU write dcu_dau_(—) 1 Out Signalindicating that the DCU has refreshcomplete completed a refresh.dcu_dau_rdata 256 Out 256-bit read data from DCU. dcu_dau_rvalid 1 OutSignal indicating valid read data on dcu_dau_rdata.

[2386] 20.14.3 DRAM Access Types

[2387] The DRAM access types used in SoPEC are summarised in Table 124.For a refresh operation the DRAM generates the address internally. TABLE124 SoPEC DRAM access types Type Access Read Random 256-bit read WriteRandom 256-bit write with byte write masking Page mode write for burstof 4 256-bit words with byte write masking Refresh Single refresh

[2388] 20.14.4 Constructing the 20 Mbit DRAM From Two 10 Mbit Instances

[2389] The 20 Mbit DRAM is constructed from two 10 Mbit instances. Theaddress ranges of the two instances are shown in Table 125. TABLE 125Address ranges of the two 10 Mbit instances in the 20 Mbit DRAM Hex256-bit word Binary 256-bit Instance Address address word addressInstance0 First word 00000 0 0000 0000 0000 0000 in lower 10 MbitInstance0 Last word 09FFF 0 1001 1111 1111 1111 in lower 10 MbitInstance1 First word 0A000 0 1010 0000 0000 0000 in upper 10 MbitInstance1 Last word 13FFF 1 0011 1111 1111 1111 in upper 10 Mbit

[2390] There are separate macro select signals, inst0_MSN and inst1_MSN,for each instance and separate dataout busses inst0_DO and inst1_DO,which are multiplexed in the DCU. Apart from these signals bothinstances share the DRAM output pins of the DCU.

[2391] The DRAM Arbitration Unit (DAU) generates a 17 bit address,dau_dcu_adr[21:5], sufficient to address all 256-bit words in the 20Mbit DRAM. The upper 5 bits are used to select between the two memoryinstances by gating their MSN pins. If instance1 is selected then thelower 16-bits are translated to map into the 10 Mbit range of thatinstance. The multiplexing and address translation rules are shown inTable 126.

[2392] In the case that the DAU issues a refresh, indicated bydau_dcu_refresh, then both macros are selected. The other controlsignals TABLE 126 Instance selection and address translation DAU Addressbits Instance dau_dcu_refresh dau_dcu_adr[21:17] selected inst0_MSNinst1_MSN Address translation 0  <01010 Instance0 MSN 1 A[15:0] =dau_dcu_adr[20:5] >=01010 Instance1 1 MSN A[15:0] = dau_dcu_adr[21:5] −hA000 1 — Instance0 MSN MSN — and Instance1

[2393] The instance selection and address translation logic is shown inFIG. 102.

[2394] The address translation and instance decode logic also incrementsthe address presented to the 20 DRAM in the case of a page mode write.Pseudo code is given below. if rising_edge(dau_dcu_valid) then //capturethe address from the DAU next_cmdadr[21:5] = dau_dcu_adr[21:5] elsifpagemode_adr_inc = = 1 then //increment the address next_cmdadr[21:5] =cmdadr[21:5] + 1 else next_cmdadr[21:5] = cmdadr[21:5] ifrising_edge(dau_dcu_valid) then //capture the address from the DAUadr_var[21:5] := dau_dcu_adr[21:5] else adr_var[21:5] := cmdadr[21:5] ifadr_var[21:17] < 01010 then //choose instance0 instance_sel = 0 A[15:0]= adr_var[20:5] else //choose instance1 instance_sel = 1 A[15:0] =adr_var[21:5] − hA000

[2395] Pseudo code for the select logic, SEL0, for DRAM Instance0 isgiven below. //instance0 selected or refresh if instance_sel == 0 ORdau_dcu_refresh == 1 then inst0_MSN = MSN else inst0_MSN = 1

[2396] Pseudo code for the select logic, SEL1, for DRAM instance1 isgiven below. //instance1 selected or refresh if instance_sel == 1 ORdau_dcu_refresh == 1 then inst1_MSN = MSN else inst1_MSN = 1

[2397] During a random read, the read data is returned, ondcu_dau_rdata, after time T_(acc), the random access time, which variesbetween 3 and 8 ns (see Table). To avoid any metastability issues theread data must be captured by a flip-flop which is enabled 2 pclk cyclesor 12.5 ns after the DRAM access has been started. The DCU generates theenable signal dcu_dau_rvalid to capture dcu_dau_rdata.

[2398] The byte write mask dau_dcu_wmask[31:0] must be expanded to thebit write mask bitwritemask[255:0 needed by the DRAM.

[2399] 20.14.5 DAU-DCU Interface Description

[2400] The DCU asserts dcu_dau_adv in the MSN2 state to indicate to theDAU to supply the next command. dcu_dau_adv causes the DAU to performarbitration in the MSN2 cycle. The resulting command is available to theDCU in the following cycle, the RST state. The timing is shown in FIG.103. The command to the DRAM must be valid in the RST and MSN1 states,or at least meet the hold time requirement to the MSN falling edge atthe start of the MSN1 state.

[2401] Note that the DAU issues a valid arbitration result followingevery dcu_dau_adv pulse. If no unit is requesting DRAM access, then afall-back refresh request will be issued. When dau_dcu_refresh isasserted the operation is a refresh and dau_dcu_adr, dau_dcu_rwn anddau_dcu_cduwpage are ignored.

[2402] The DCU generates a second signal, dcu_dau_wadv, which isasserted in the RST state. This indicates to the DAU that it can performarbitration in advance for non-CPU writes. The reason for performingarbitration in advance for non-CPU writes is explained in “CommandMultiplexor Sub-block TABLE 136 Command Multiplexor Sub-block IODefinition Port name Pins I/O Description Clocks and Resets pclk 1 InSystem Clock prst_n 1 In System reset, synchronous active low DIU ReadInterface to SoPEC Units <unit>_diu_(—) 17 In Read address to DIUradr[21:5] 17 bits wide (256-bit aligned word). diu_<unit>_(—) 1 OutAcknowledge from DIU that read rack request has been accepted and newread address can be placed on <unit>_diu_radr DIU Write Interface toSoPEC Units <unit>_diu_(—) 17 In Write address to DIU except CPU,wadr[21:5] SCB, CDU 17 bits wide (256-bit aligned word) cpu_diu_(—) 22In CPU Write address to DIU wadr[21:4]] (128-bit aligned address.)cpu_diu_wmask 16 In Byte enables for CPU write. cdu_diu_(—) 19 In CDUWrite address to DIU wadr[21:3] 19 bits wide (64-bit aligned word)Addresses cannot cross a 256-bit word DRAM boundary. diu_<unit>_(—) 1Out Acknowledge from DIU that write wack request has been accepted andnew write address can be placed on <unit>_diu_wadr Outputs to CPUInterface and Arbitration Logic sub-block re_arbitrate 1 Out Signallingtelling the arbitration logic to choose the next arbitra- tion winner.re_arbitrate_(—) 1 Out Signal telling the arbitration wadv logic tochoose the next arbitra- tion winner for non-CPU writes 2 timeslots inadvance Debug Outputs to CPU Configuration and Arbitration LogicSub-block write_sel 5 Out Signal indicating the SoPEC Unit for which thecurrent write trans- action is occurring. Encoding is described in Table. write_(—) 1 Out Signal indicating that write trans- complete action toSoPEC Unit indicated by write_sel is complete. Inputs from CPU Interfaceand Arbitration Logic sub-block arb_gnt 1 In Signal lasting 1 cyclewhich indi- cates arbitration has occurred and arb_sel is valid. arb_sel5 In Signal indicating which requesting SoPEC Unit has won arbitration.Encoding is described in Table . dir_sel 2 In Signal indicating whichsense of access associated with arb_sel 00: issue non-CPU write 01: readwinner 10: write winner 11: refresh winner Inputs from Read WriteMultiplexor Sub-block write_data_(—) 2 In Signal indicating that validwrite valid data is available for the current command. 00 = not valid 01= CPU write data valid 10 = non-CPU write data valid 11 = both CPU andnon-CPU write data valid wdata 256 In 256-bit non-CPU write datacpu_wdata 32 In 32-bit CPU write data Outputs to Read Write MultiplexorSub-block write_data_(—) 2 Out Signal indicating the Command acceptMultiplexor has accepted the write data from the write multiplexor 00 =not valid 01 = accepts CPU write data 10 = accepts non-CPU write data 11= not valid Inputs from DCU dcu_dau_adv In Signal indicating to DAU tosup- ply next command to DCU dcu_dau_wadv In Signal indicating to DAU toini- tiate next non-CPU write Outputs to DCU dau_dcu_(—) 17 Out Signalindicating the address for adr[21:5] the DRAM access. This is a 256- bitaligned DRAM address. dau_dcu_rwn 1 Out Signal indicating the directionfor the DRAM access (1 = read, 0 = write). dau_dcu_(—) 1 Out Signalindicating if access is a cduwpage CDU write page mode access (1 = CDUpage mode, 0 = not CDU page mode). dau_dcu_(—) 1 Out Signal indicatingthat a refresh refresh command is to be issued. If asserted dau_dcu_adr,dau_dcu_rwn and dau_dcu_cduwpage are ignored. dau_dcu_wdata 256 Out256-bit write data to DCU dau_dcu_wmask 32 Out Byte encoded write datamask for 256-bit dau_dcu_wdata to DCU

[2403] The DCU state-machine can stall in the MSN2 state when the signaldau_dcu_msn2stall is asserted by the DAU Arbitration Logic,

[2404] The states of the DCU state-machine are summarised in Table 127.TABLE 127 States of the DCU state-machine State Description RST Restorestate MSN1 Macro select state 1 MSN2 Macro select state 2

[2405] 20.14.6 DCU State Machines

[2406] The IBM DRAM has a simple SRAM like interface. The DRAM isaccessed as a single bank. The state machine to access the DRAM is shownin FIG. 104.

[2407] The signal pagemode_adr_inc is exported from the DCU asdcu_dau_cduwaccept. dcu_dau_cduwaccept tells the DAU to supply the nextwrite data to the DRAM

[2408] 20.14.7 CU-11DRAM Timing Diagrams

[2409] The IBM Cu-11 embedded DRAM datasheet is referenced as [16].

[2410] Table 128 shows the timing parameters which must be obeyed forthe IBM embedded DRAM. TABLE 128 1.5 V Cu-11 DRAM a.c. parameters SymbolParameter Min Max Units T_(set) Input setup to MSN/PGN 1 — ns T_(hld)Input hold to MSN/PGN 2 — ns T_(acc) Random access time 3 8 ns T_(act)MSN active time 8 100 k ns T_(res) MSN restore time 4 — ns T_(cyc)Random R/W cycle time 12 — ns T_(rfc) Refresh cycle time 12 — nsT_(accp) Page mode access time 1  3.9 ns T_(pa) PGN active time 1.6 — nsT_(pr) PGN restore time 1.6 — ns T_(pcyc) PGN cycle time 4 — ns T_(mprd)MSN to PGN restore delay 6 — ns T_(actp) MSN active for page mode 12 —ns T_(ref) Refresh period —  3.2 ms T_(pamr) Page active to MSN restore4 — ns

[2411] The IBM DRAM is asynchronous. In SoPEC it interfaces to signalsclocked on pclk. The following timing diagrams show how the timingparameters in Table 129 are satisfied in SoPEC.

[2412] 20.14.8 Definition of DAU IO TABLE 129 DAU interface Port NamePins I/O Description Clocks and Resets pclk 1 In SoPEC Functional clockprst_n 1 In Active-low, synchronous reset in pclk domain dau_dcu_reset_n1 Out Active-low, synchronous reset in pclk domain. This reset signal,exported to the DCU, incorporates the locally captured DAU version ofhard reset (prst_n) and the soft reset configuration register bit“Reset”. CPU Interface cpu_adr 22 In CPU address bus for both DRAM andconfiguration register access. 9 bits (bits 10:2) are required to decodethe configuration register address space. 22 bits can address the DRAMat byte level. DRAM addresses cannot cross a 256-bit word DRAM boundary.cpu_dataout 32 In Shared write data bus from the CPU for DRAM andconfiguration data diu_cpu_data 32 Out Configuration, status and debugread data bus to the CPU diu_cpu_(—) 1 Out Signal indicating the datadebug_valid on the diu_cpu_data bus is valid debug data. cpu_rwn 1 InCommon read/not-write signal from the CPU cpu_acode 2 In CPU access codesignals. cpu_acode[0] - Program (0)/ Data (1) access cpu_acode[1] - User(0)/ Supervisor (1) access The DAU will only allow super- visor modeaccesses to data space. cpu_diu_sel 1 In Block select from the CPU. Whencpu_diu_sel is high both cpu_adr and cpu_dataout are valid diu_cpu_rdy 1Out Ready signal to the CPU. When diu_cpu_rdy is high it indicates thelast cycle of the access. For a write cycle this means cpu_dataout hasbeen registered by the block and for a read cycle this means the data ondiu_cpu_data is valid. diu_cpu_berr 1 Out Bus error signal to the CPUindicating an invalid access. DIU Read Interface to SoPEC Units<unit>_diu_rreq 1 In SoPEC unit requests DRAM read. A read request mustbe accompanied by a valid read address. <unit>_diu_(—) 17 In Readaddress to DIU radr[21:5] 17 bits wide (256-bit aligned word). Note:“<unit>” refers to non-CPU requesters only. CPU addresses are providedvia “cpu_adr”. diu_<unit>_(—) 1 Out Acknowledge from DIU that rack readrequest has been accepted and new read address can be placed on<unit>_diu_radr diu_data 64 Out Data from DIU to SoPEC Units except CPU.First 64-bits is bits 63:0 of 256 bit word Second 64-bits is bits 127:64of 256 bit word Third 64-bits is bits 191:128 of 256 bit word Fourth64-bits is bits 255:192 of 256 bit word dram_cpu_data 256 Out 256-bitdata from DRAM to CPU. diu_<unit>_(—) 1 Out Signal from DIU tellingSoPEC rvalid Unit that valid read data is on the diu_data bus DIU WriteInterface to SoPEC Units <unit>_diu_(—) 1 In SoPEC unit requests DRAMwrite. wreq A write request must be accompanied by a valid writeaddress. Note: “<unit>” refers to non-CPU requesters only.<unit>_diu_(—) 17 In Write address to DIU except wadr[21:5] CPU, CDU 17bits wide (256-bit aligned word) Note: “<unit>” refers to non-CPUrequesters, excluding the CDU. scb_diu_(—) 8 In Byte write enablesapplicable wmask[7:0] to a given 64-bit quarter-word transferred fromthe SCB. Note that different mask values are used with eachquarter-word. Requirement for the USB host core. diu_cpu_(—) 1 Out Flagindicating that the CPU write_rdy posted write buffer is empty.cpu_diu_(—) 1 In Write enable for the CPU wdatavalid posted writebuffer. Also confirms that the CPU write data, address and mask arevalid. cpu_diu_wdata 128 In CPU write data which is loaded into theposted write buffer. cpu_diu_(—) 18 In 128-bit aligned CPU writewadr[21:4] address. cpu_diu_(—) 16 In Byte enables for 128-bit CPUwmask[15:0] posted write. cdu_diu_(—) 19 In CDU Write address to DIUwadr[21:3] 19 bits wide (64-bit aligned word) Addresses cannot cross a256-bit word DRAM boundary. diu_<unit>_(—) 1 Out Acknowledge from DIUthat wack write request has been accepted and new write address can beplaced on <unit>_diu_wadr <unit>_diu_(—) 64 In Data from SoPEC Unit toDIU data[63:0] except CPU. First 64-bits is bits 63:0 of 256 bit wordSecond 64-bits is bits 127:64 of 256 bit word Third 64-bits is bits191:128 of 256 bit word Fourth 64-bits is bits 255:192 of 256 bit wordNote: “<unit>” refers to non-CPU requesters only. <unit>_diu_(—) 1 InSignal from SoPEC Unit wvalid indicating that data on <unit>_diu_data isvalid. Note: “<unit>” refers to non-CPU requesters only. Outputs to DCUdau_dcu_(—) 1 Out Signal indicating from DAU msn2stall Arbitration Logicwhich when de-asserted stalls DCU in MSN2 state. dau_dcu_(—) 17 OutSignal indicating the address adr[21:5] for the DRAM access. This is a256-bit aligned DRAM address. dau_dcu_rwn 1 Out Signal indicating thedirection for the DRAM access (1 = read, 0 = write). dau_dcu_(—) 1 OutSignal indicating if access cduwpage is a CDU write page mode access (1= CDU page mode, 0 = not CDU page mode). dau_dcu_refresh 1 Out Signalindicating that a refresh command is to be issued. If asserteddau_dcu_cmd_adr, dau_dcu_rwn and dau_dcu_cduwpage are ignored.dau_dcu_wdata 256 Out 256-bit write data to DCU dau_dcu_wmask 32 OutByte-encoded write data mask for 256-bit dau_dcu_wdata to DCU Polarity:A “1” in a bit field of dau_dcu_wmask means that the corresponding bytein the 256-bit dau_dcu_wdata is written to DRAM. Inputs from DCUdcu_dau_adv 1 In Signal indicating to DAU to supply next command to DCUdcu_dau_wadv 1 In Signal indicating to DAU to initiate next non-CPUwrite dcu_dau_(—) 1 In Signal indicating that the DCU refreshcompletehas completed a refresh. dcu_dau_rdata 256 In 256-bit read data fromDCU. dcu_dau_rvalid 1 In Signal indicating valid read data ondcu_dau_rdata.

[2413] The CPU subsystem bus interface is described in more detail inSection 11.4.3. The DAU block will only allow supervisor-mode accessesto update its configuration registers (i.e. cpu_acode[1:0]=b11). Allother accesses will result in diu_cpu_berr being asserted.

[2414] 20.14.9 DAU Configuration Registers TABLE 130 DAU configurationregisters Address # (DIU_base +) Register bits Reset Description Reset0x00 Reset 1 0x1 A write to this register causes a reset of the DIU.This register can be read to indicate the reset state: 0 - reset inprogress 1 - reset not in progress Refresh 0x04 RefreshPeriod 9 0x063Refresh controller. When set to 0 refresh is off, other- wise the valueindicates the number of cycles, less one, between each refresh. [Notethat for a system clock frequency of 160 MHz, a value exceeding 0x63(indicating a 100-cycle refresh period) should not be programmed, or theDRAM will malfunction.] Timeslot allocation and control 0x08NumMainTimeslots 6 0x01 Number of main timeslots (1-64) less one 0x0CCPUPreAccessTimes 4 0x0 (CPUPreAccessTimeslots + 1) main slots lots outof a total of (CPUTotalTimeslots + 1) are preceded by a CPU access. 0x10CPUTotalTimeslots 4 0x0 (CPUPreAccessTimeslots + 1) main slots out of atotal of (CPUTotalTimeslots + 1) are preceded by a CPU access.0x100-0x1FC MainTimeslot[63:0] 64 × 4 [63:1][3:0] = 0x0 Programmablemain timeslots (up to [0][3:0] = 0xE 64 main timeslots). 0x200ReadRoundRobinLevel 12 0x000 For each read requester plus refresh 0 =level1 of round-robin 1 = level2 of round-robin The bit order is definedin Table . 0x204 EnableCPURoundRobin 1 0x1 Allows the CPU to particpatein the unused read round-robin scheme. If disabled, the sharedCPU/refresh round-robin position is dedicated solely to refresh. 0x208RotationSync 1 0x1 Writing 0, followed by 1 to this bit allows thetimeslot rotation to advance on a cycle basis which can be determined bythe CPU. 0x20C minNonCPUReadAdr 12 0x800 12 MSBs of lowest DRAM addresswhich may be read by non-CPU requesters. 0x210 minDWUWriteAdr 12 0x80012 MSBs of lowest DRAM address which may be written to by the DWU. 0x214minNonCPUWriteAdr 12 0x800 12 MSBs of lowest DRAM address which may bewritten to by non-CPU requesters other than the DWU. Debug 0x300DebugSelect[11:2] 10 0x304 Debug address select. Indicates the addressof the register to report on the diu_cpu_data bus when it is nototherwise being used. When this signal carries debug informa- tion thesignal diu_cpu_debug_valid will be asserted. Debug: arbitration andperformance 0x304 ArbitrationHistory 22 — Bit 0 = arb_gnt Bit 1 =arb_executed Bit 6:2 = arb_sel[4:0] Bit 12:7 = timeslot_number[5:0] Bit15:13 = access_type[2:0] Bit 16 = back2back_non_cpu_write Bit 17 =sticky_back2back_non_cpu_write (Sticky version of same, cleared onreset.) Bit 18 = rotation_sync Bit 20:19 = rotation_state Bit 21 =sticky_invalid_non_cpu_adr See Section 20.14.9.2 DIU Debug for adescription of the fields. Read only register. 0x308 DIUPerformance 31 —Bit 0 = cpu_diu_rreq Bit 1 = scb_diu_rreq Bit 2 = cdu_diu_rreq Bit 3 =cfu_diu_rreq Bit 4 = lbd_diu_rreq Bit 5 = sfu_diu_rreq Bit 6 =td_diu_rreq Bit 7 = tfs_diu_rreq Bit 8 = hcu_diu_rreq Bit 9 =dnc_diu_rreq Bit 10 = llu_diu_rreq Bit 11 = pcu_diu_rreq Bit 12 =cpu_diu_wreq Bit 13 = scb_diu_wreq Bit 14 = cdu_diu_wreq Bit 15 =sfu_diu_wreq Bit 16 = dwu_diu_wreq Bit 17 = refresh_req Bit 22:18 =read_sel[4:0] Bit 23 = read_complete Bit 28:24 = write_sel[4:0] Bit 29 =write_complete Bit 30 = dcu_dau_refreshcomplete See Section 20.14.9.2DIU Debug for a description of the fields. Read only register. Debug DIUread requesters interface signals 0x30C CPUReadInterface 25 — Bit 0 =cpu_diu_rreq Bit 22:1 = cpu_adr[21:0] Bit 23 = diu_cpu_rack Bit 24 =diu_cpu_rvalid Read only register. 0x310 SCBReadInterface 20 — Bit 0 =scb_diu_rreq Bit 17:1 = scb_diu_radr[21:5] Bit 18 = diu_scb_rack Bit 19= diu_scb_rvalid Read only register. 0x314 CDUReadInterface 20 — Bit 0 =cdu_diu_rreq Bit 17:1 = cdu_diu_radr[21:5] Bit 18 = diu_cdu_rack Bit 19= diu_cdu_rvalid Read only register. 0x318 CFUReadInterface 20 — Bit 0 =cfu_diu_rreq Bit 17:1 = cfu_diu_radr[21:5] Bit 18 = diu_cfu_rack Bit 19= diu_cfu_rvalid Read only register. 0x31C LBDReadInterface 20 — Bit 0 =lbd_diu_rreq Bit 17:1 = lbd_diu_radr[21:5] Bit 18 = diu_lbd_rack Bit 19= diu_lbd_rvalid Read only register. 0x320 SFUReadInterface 20 — Bit 0 =sfu_diu_rreq Bit 17:1 = sfu_diu_radr[21:5] Bit 18 = diu_sfu_rack Bit 19= diu_sfu_rvalid Read only register. 0x324 TDReadInterface 20 — Bit 0 =td_diu_rreq Bit 17:1 = td_diu_radr[21:5] Bit 18 = diu_td_rack Bit 19 =diu_td_rvalid Read only register. 0x328 TFSReadInterface 20 — Bit 0 =tfs_diu_rreq Bit 17:1 = tfs_diu_radr[21:5] Bit 18 = diu_tfs_rack Bit 19= diu_tfs_rvalid Read only register. 0x32C HCUReadInterface 20 — Bit 0 =hcu_diu_rreq Bit 17:1 = hcu_diu_radr[21 :5] Bit 18 = diu_hcu_rack Bit 19= diu_hcu_rvalid Read only register. 0x330 DNCReadInterface 20 — Bit 0 =dnc_diu_rreq Bit 17:1 = dnc_diu_radr[21:5] Bit 18 = diu_dnc_rack Bit 19= diu_dnc_rvalid Read only register. 0x334 LLUReadInterface 20 — Bit 0 =llu_diu_rreq Bit 17:1 = lluu_diu_radr[21:5] Bit 18 = diu_llu_rack Bit 19= diu_llu_rvalid Read only register. 0x338 PCUReadInterface 20 — Bit 0 =pcu_diu_rreq Bit 17:1 = pcu_diu_radr[21:5] Bit 18 = diu_pcu_rack Bit 19= diu_pcu_rvalid Read only register. Debug DIU write requestersinterface signals 0x33C CPUWriteInterface 27 — Bit 0 = cpu_diu_wreq Bit22:1 = cpu_adr[21:0] Bit 24:23 = cpu_diu_wmask[1:0] Bit 25 =diu_cpu_wack Bit 26 = cpu_diu_wvalid Read only register. 0x340SCBWriteInterface 20 — Bit 0 = scb_diu_wreq Bit 17:1 =scb_diu_wadr[21:5] Bit 18 = diu_scb_wack Bit 19 = scb_diu_wvalid Readonly register. 0x344 CDUWriteInterface 22 — Bit 0 = cdu_diu_wreq Bit19:1 = cdu_diu_wadr[21:3] Bit 20 = diu_cdu_wack Bit 21 = cdu_diu_wvalidRead only register. 0x348 SFUWriteInterface 20 — Bit 0 = sfu_diu_wreqBit 17:1 = sfu_diu_wadr[21:5] Bit 18 = diu_sfu_wack Bit 19 =sfu_diu_wvalid Read only register. 0x34C DWUWriteInterface 20 — Bit 0 =dwu_diu_wreq Bit 17:1 = dwu_diu_wadr[21:5] Bit 18 = diu_dwu_wack . Bit19 = dwu_diu_wvalid Read only register. Debug DAU-DCU interface signals0x350 DAU-DCUInterface 25 — Bit 16:0 = dau_dcu_adr[21:5] Bit 17 =dau_dcu_rwn Bit 18 = dau_dcu_cduwpage Bit 19 = dau_dcu_refresh Bit 20 =dau_dcu_msn2stall Bit 21 = dcu_dau_adv Bit 22 = dcu_dau_wadv Bit 23 =dcu_dau_refreshcomplete Bit 24 = dcu_dau_rvalid Read only register.

[2415] Each main timeslot can be assigned a SoPEC DIU requestoraccording to Table 131. TABLE 131 SoPEC DIU requester encoding for maintimeslots. Index Index Name (binary) (HEX) Write SCB(W) b0_0000 0x00CDU(W) b0001 0x1 SFU(W) b0010 0x2 DWU b0011 0x3 Read SCB(R) b0100 0x4CDU(R) b0101 0x5 CFU b0110 0x6 LBD b0111 0x7 SFU(R) b1000 0x8 TE(TD)b1001 0x9 TE(TFS) b1010 0xA HCU b1011 0xB DNC b1100 0xC LLU b1101 0xDPCU b1110 0xE

[2416] ReadRoundRobinLevel and ReadRoundRobinEnable registers areencoded in the bit order defined in Table 132. TABLE 132 Readround-robin registers bit order Bit Name index SCB(R) 0 CDU(R) 1 CFU 2LBD 3 SFU(R) 4 TE(TD) 5 TE(TFS) 6 HCU 7 DNC 8 LLU 9 PCU 10 CPU/Refresh11

[2417] 20.14.9.1 Configuration Register Reset State

[2418] The RefreshPeriod configuration register has a reset value of0x063 which ensures that a refresh will occur every 100 cycles and thecontents of the DRAM will remain valid.

[2419] The CPUPreAccessTimeslots and CPUTotalTimeslots configurationregisters both have a reset value of 0x0. Matching values in these tworegisters means that every slot has a CPU pre-acess. NumMainTimeslots isreset to 0x1, so there are just 2 main timeslots in the rotationinitially. These slots alternate between SCB writes and PCU reads, asdefined by the reset value of MainTimeslot[63:0], thus respecting atreset time the general rule that adjacent non-CPU writes are notpermitted.

[2420] The first access issued by the DIU after reset will be a refresh.

[2421] 20.14.9.2 DIU Debug

[2422] External visibility of the DIU must be provided for debugpurposes. To facilitate this debug registers are added to the DIUaddress space.

[2423] The DIU CPU system data bus diu_cpu_data[31:0] returnsconfiguration and status register information to the CPU. When aconfiguration or status register is not being read by the CPU debug datais returned on diu_cpu_data[31:0] instead. An accompanying active highdiu_cpu_debug_valid signal is used to indicate when the data buscontains valid debug data. The DIU features a DebugSelect register thatcontrols a local multiplexor to determine which register is output ondiu_cpu_data[31:0].

[2424] Three kinds of debug information are gathered:

[2425] a. The order and access type of DIU requesters winningarbitration.

[2426] This information can be obtained by observing the signals in theArbitrationHistory debug register at DIU_Base+0x304 described in Table133. TABLE 133 ArbitrationHistory debug register description, DIU_base +0x304 Field name Bits Description arb_gnt 1 Signal lasting 1 cycle whichis asserted in the cycle following a main arbitration orpre-arbitration. arb_executed 1 Signal lasting 1 cycle which indi- catesthat an arbitration result has actually been executed. Is used todifferentiate between *pre*- arbitration and *main* arbitration, both ofwhich cause arb_gnt to be asserted. If arb_executed and arb_gnt are bothhigh, then a main (executed) arbitration is indicated. arb_sel 5 Signalindicating which requesting SoPEC Unit has won arbitration. Encoding isdescribed in Table . Refresh winning arbitration is indicated byaccess_type. timeslot_number 6 Signal indicating which main time- slotis either currently being serviced, or about to be serviced. The lattercase applies where a main slot is preempted by a CPU pre-access or ascheduled refresh. access_type 3 Signal indicating the origin of thewinning arbitration 000 = Standard CPU pre-access. 001 = Scheduledrefresh. 010 = Standard non-CPU timeslot. 011 = CPU access via unusedread/ write slot, re-allocated by round robin. 100 = Non-CPU write viaunused write slot, re-allocated at prearbitration. 101 = Non-CPU readvia unused read/ write slot, re-allocated by round robin. 110 = Refreshvia unused read/write slot, re-allocated by round robin. 111 =CPU/Refresh access due to RotationSync = 0. back2back_(—) 1Instantaneous indicator of attempted non_cpu_write illegal back-to-backnon-CPU write. (Recall from section 20.7.2.3 on page 212 that the secondwrite of any such pair is disregarded and re-allocated via the unusedread round-robin scheme.) sticky_back2back_(—) 1 Sticky version of same,cleared on non_cpu_write reset. rotation_sync 1 Current value of theRotationSync configuration bit. rotation_state 2 These bits indicate thecurrent status of pre-arbitation and main timeslot rotation, as a resultof the RotationSync setting. 00 = Pre-arb enabled, rotation enabled. 01= Pre-arb disabled, rotation enabled. 10 = Pre-arb disabled, rotationdisabled. 11 = Pre-arb enabled, rotation disabled. 00 is the normalfunctional setting when RotationSync is 1. 01 indicates thatpre-arbitration has halted at the end of its rotation because ofRotationSync having been cleared. However the main arbitration has yetto finish its current rota- tion. 10 indicates that both pre-arb and themain rotation have halted, due to RotationSync being 0 and that only CPUaccesses and refreshes are allowed. 11 indicates that RotationSync hasjust been changed from 0 to 1 and that pre-arbitration is being given ahead start to look ahead for non- CPU writes, in advance of the mainrotation starting up again. sticky_invalid_(—) 1 Sticky bit to indicatean attempted non_cpu_adr non-CPU access with an invalid ad- dress.Cleared by reset or by an explicit write by the CPU.

[2427] TABLE 134 arb_sel, read_sel and write_sel encoding Index IndexName (binary) (HEX) Write SCB(W) b0_0000 0x00 CDU(W) b0_0001 0x01 SFU(W)b0_0010 0x02 DWU b0_0011 0x03 Read SCB(R) b0_0100 0x04 CDU(R) b0_01010x05 CFU b0_0110 0x06 LBD b0_0111 0x07 SFU(R) b0_1000 0x08 TE(TD)b0_1001 0x09 TE(TFS) b0_1010 0x0A HCU b0_1011 0x0B DNC b0_1100 0x0C LLUb0_1101 0x0D PCU b0_1110 0x0E Refresh Refresh b0_1111 0x0F CPU CPU(R)b1_0000 0x10 CPU(W) b1_0001 0x11

[2428] The encoding for arb_sel is described in Table 134.

[2429] b. The time between a DIU requester requesting an access andcompleting the access. This information can be obtained by observing thesignals in the DIUPerformance debug register at DIU_Base+0x308 describedin Table 135. The encoding for read_sel and write_sel is described inTable. The data collected from DIUPerformance can be post-processed tocount the number of cycles between a unit requesting DIU access and theaccess being completed. TABLE 135 DIUPerformance debug registerdescription, DIU_base + 0x308 Field name Bits Description <unit>_diu_(—)12 Signal indicating that SoPEC unit rreq requests DRAM read.<unit>_diu_(—) 5 Signal indicating that SoPEC unit wreq requests DRAMwrite. refresh_req 1 Signal indicating that refresh has requested a DIUaccess. read_sel[4:0] 5 Signal indicating the SoPEC Unit for which thecurrent read transaction is occurring. Encoding is described in Table .read_complete 1 Signal indicating that read trans- action to SoPEC Unitindicated by read_sel is complete i.e. that the last read data has beenoutput by the DIU. write_sel[4:0] 5 Signal indicating the SoPEC Unit forwhich the current write transaction is occurring. Encoding is describedin Table . write_complete 1 Signal indicating that write trans- actionto SoPEC Unit indicated by write_sel is complete i.e. that the lastwrite data has been transferred to the DIU. dcu_refresh_(—) 1 Signalindicating that refresh has complete completed.

c. Interface Signals to DIU Requestors and DAU-DCU Interface

[2430] All interface signals with the exception of data busses at theinterfaces between the DAU and DCU and DIU write and read requestors canbe monitored in debug mode by observing debug registers DIU_Base+0x314to DIU_Base+0x354.

[2431] 20.14.10 DRAM Arbitration Unit (DAU)

[2432] The DAU is shown in FIG. 101.

[2433] The DAU is composed of the following sub-blocks

[2434] a. CPU Configuration and Arbitration Logic sub-block.

[2435] b. Command Multiplexor sub-block.

[2436] c. Read and Write Data Multiplexor sub-block.

[2437] The function of the DAU is to supply DRAM commands to the DCU.

[2438] The DCU requests a command from the DAU by asserting dcu_dau_adv.

[2439] The DAU Command Multiplexor requests the Arbitration Logicsub-block to arbitrate the next DRAM access. The Command Multiplexorpasses dcu_dau_adv as the re_arbitrate signal to the Arbitration Logicsub-block.

[2440] If the RotationSync bit has been cleared, then the arbitrationlogic grants exclusive access to the CPU and scheduled refreshes. If thebit has been set, regular arbitration occurs. A detailed description ofRotationSync is given in section 20.14.12.2.1 on page 295.

[2441] Until the Arbitration Logic has a valid result it stalls the DCUby asserting dau_dcu_msn2stall. The Arbitration Logic then returns theselected arbitration winner to the Command Multiplexor which issues thecommand to the DRAM. The Arbitration Logic could stall for example if itselected a shared read bus access but the Read Multiplexor indicated itwas busy by de-asserting read_cmd_rdy[1].

[2442] In the case of a read command the read data from the DRAM ismultiplexed back to the read requestor by the Read Multiplexor. In thecase of a write operation the Write Multiplexor multiplexes the writedata from the selected DIU write requester to the DCU before the writecommand can occur. If the write data is not available then the CommandMultiplexor will keep dau_dcu_valid de-asserted. This will stall the DCUuntil the write command is ready to be issued.

[2443] Arbitration for non-CPU writes occurs in advance. The DCUprovides a signal dcu_dau_wadv which the Command Multiplexor issues tothe Arbitrate Logic as re_arbitrate_wadv. If arbitration is blocked bythe Write Multiplexor being busy, as indicated by write_cmd_rdy[1] beingde-asserted, then the Arbitration Logic will stall the DCU by assertingdau_dcu_msn2stall until the Write Multiplexor is ready.

[2444] 20.14.10.1 Read Accesses

[2445] The timing of a non-CPU DIU read access are shown in FIG. 109.Note re_arbitrate is asserted in the MSN2 state of the previous access.

[2446] Note the fixed timing relationship between the readacknowledgment and the first rvalid for all non-CPU reads. This meansthat the second and any later reads in a back-to-back non-CPU sequencehave their acknowledgments asserted one cycle later, i.e. in the “MSN1”DCU state. The timing of a CPU DIU read access is shown in FIG. 110.Note re_arbitrate is asserted in the MSN2 state of the previous access.

[2447] Some points can be noted from FIG. 109 and FIG. 110.

[2448] DIU requests:

[2449] For non-CPU accesses the <unit>_diu_rreq signals are registeredbefore the arbitration can occur.

[2450] For CPU accesses the cpu_diu_rreq signal is not registered toreduce CPU DIU access latency.

[2451] Arbitration occurs when the dcu_dau_adv signal from the DCU isasserted. The DRAM address for the arbitration winner is available inthe next cycle, the RST state of the DCU.

[2452] The DRAM access starts in the MSN1 state of the DCU and completesin the RST state of the DCU.

[2453] Read data is available:

[2454] In the MSN2 cycle where it is output unregistered to the CPU

[2455] In the MSN2 cycle and registered in the DAU before being outputin the next cycle to all other read requesters in order to ease timing.

[2456] The DIU protocol is in fact:

[2457] Pipelined i.e. the following transaction is initiated while theprevious transfer is in progress.

[2458] Split transaction i.e. the transaction is split into independentaddress and data transfers.

[2459] Some general points should be noted in the case of CPU accesses:

[2460] Since the CPU request is not registered in the DIU beforearbitration, then the CPU must generate the request, route it to the DAUand complete arbitration all in 1 cycle. To facilitate this CPU accessis arbitrated late in the arbitration cycle (see Section 20.14.12.2).

[2461] Since the CPU read data is not registered in the DAU and CPU readdata is available 8 ns after the start of the access then 4.5 ns areavailable for routing and any shallow logic before the CPU read data iscaptured by the CPU (see Section 20.14.4).

[2462] The phases of CPU DIU read access are shown in FIG. 111. Thismatches the timing shown in Table 135.

[2463] 20.14.10.2 Write Accesses

[2464] CPU writes are posted into a 1-deep write buffer in the DIU andwritten to DRAM as shown below in FIG. 112.

[2465] The sequence of events is as follows:—

[2466] [1] The DIU signals that its buffer for CPU posted writes isempty (and has been for some time in the case shown).

[2467] [2] The CPU asserts “cpu_diu_wdatavalid” to enable a write to theDIU buffer and presents valid address, data and write mask. The CPUconsiders the write posted and thus complete in the cycle following [2]in the diagram below.

[2468] [3] The DIU stores the address/data/mask in its buffer andindicates to the arbitration logic that a posted write wishes toparticipate in any upcoming arbitration.

[2469] [4] Provided the CPU still has a pre-access entitlement left, oris next in line for a round-robin award, a slot is arbitrated in favourof the posted write. Note that posted CPU writes have higher arbitrationpriority than simultaneous CPU reads.

[2470] [5] The DRAM write occurs.

[2471] [6] The earliest that “diu_cpu_write_rdy” can be re-asserted inthe “MSN1” state of the DRAM write. In the same cycle, having seen there-assertion, the CPU can asynchronously turn around“cpu_diu_wdatavalid” and enable a subsequent posted write, should itwish to do so. The timing of a non-CPU/non-CDU DIU write access is shownbelow in FIG. 113.

[2472] Compared to a read access, write data is only available from therequester 4 cycles after the address. An extra cycle is used to ensurethat data is first registered in the DAU, before being despatched toDRAM. As a result, writes are pre-arbitrated 5 cycles in advance of themain arbitration decision to actually write the data to memory.

[2473] The diagram above shows the following sequence of events:—

[2474] [1] A non-CPU block signals a write request.

[2475] [2] A registered version of this is available to the DAUarbitration logic.

[2476] [3] Write pre-arbitration occurs in favour of the requester.

[2477] [4] A write acknowledgment is returned by the DIU.

[2478] [5] The pre-arbitration will only be upheld if the requestersupplies 4 consecutive write data quarter-words, qualified by anasserted wvalid flag.

[2479] [6] Provided this has happened, the main arbitration logic is ina position at [6] to reconfirm the pre-arbitration decision. Notehowever that such reconfirmation may have to wait a further one or twoDRAM accesses, if the write is pre-empted by a CPU pre-access and/or ascheduled refresh.

[2480] [7] This is the earliest that the write to DRAM can occur.

[2481] Note that neither the arbitration at [8] nor the pre-arbitrationat [9] can award its respective slot to a non-CPU write, due to the banon back-to-back accesses.

[2482] The timing of a CDU DIU write access is shown overleaf in FIG.114.

[2483] This is simular to a regular non-CPU write access, but uses pagemode to carry out 4 consecutive DRAM writes to contiguous addresses. Asa consequence, subsequent accesses are delayed by 6 cycles, as shown inthe diagram. Note that a new write can be pre-arbitrated at [10] in FIG.114.

[2484] 20.14.11 Command Multiplexor Sub-Block TABLE 136 CommandMultiplexor Sub-block IO Definition Port name Pins I/O DescriptionClocks and Resets pclk 1 In System Clock prst_n 1 In System reset,synchronous active low DIU Read Interface to SoPEC Units <unit>_diu_(—)17 In Read address to DIU radr[21:5] 17 bits wide (256-bit alignedword). diu_<unit>_(—) 1 Out Acknowledge from DIU that read rack requesthas been accepted and new read address can be placed on <unit>_diu_radrDIU Write Interface to SoPEC Units <unit>_diu_(—) 17 In Write address toDIU except CPU, wadr[21:5] SCB, CDU 17 bits wide (256-bit aligned word)cpu_diu_(—) 22 In CPU Write address to DIU wadr[21:4]] (128-bit alignedaddress.) cpu_diu_(—) 16 In Byte enables for CPU write. wmaskcdu_diu_(—) 19 In CDU Write address to DIU wadr[21:3] 19 bits wide(64-bit aligned word) Addresses cannot cross a 256-bit word DRAMboundary. diu_<unit>_(—) 1 Out Acknowledge from DIU that write wackrequest has been accepted and new write address can be placed on<unit>_diu_wadr Outputs to CPU Interface and Arbitration Logic sub-blockre_arbitrate 1 Out Signalling telling the arbitration logic to choosethe next arbi- tration winner. re_arbitrate_(—) 1 Out Signal telling thearbitration wadv logic to choose the next arbi- tration winner fornon-CPU writes 2 timeslots in advance Debug Outputs to CPU Configurationand Arbitration Logic Sub-block write_sel 5 Out Signal indicating theSoPEC Unit for which the current write trans- action is occurring.Encoding is described in Table . write_(—) 1 Out Signal indicating thatwrite trans- complete action to SoPEC Unit indicated by write_sel iscomplete. Inputs from CPU Interface and Arbitration Logic sub-blockarb_gnt 1 In Signal lasting 1 cycle which indi- cates arbitration hasoccurred and arb_sel is valid. arb_sel 5 In Signal indicating whichrequesting SoPEC Unit has won arbitration. Encoding is described inTable . dir_sel 2 In Signal indicating which sense of access associatedwith arb_sel 00: issue non-CPU write 01: read winner 10: write winner11: refresh winner Inputs from Read Write Multiplexor Sub-blockwrite_data_(—) 2 In Signal indicating that valid write valid data isavailable for the current command. 00 = not valid 01 = CPU write datavalid 10 = non-CPU write data valid 11 = both CPU and non-CPU write datavalid wdata 256 In 256-bit non-CPU write data cpu_wdata 32 In 32-bit CPUwrite data Outputs to Read Write Multiplexor Sub-block write_data_(—) 2Out Signal indicating the Command accept Multiplexor has accepted thewrite data from the write multiplexor 00 = not valid 01 = accepts CPUwrite data 10 = accepts non-CPU write data 11 = not valid Inputs fromDCU dcu_dau_(—) 1 In Signal indicating to DAU to adv supply next commandto DCU dcu_dau_(—) 1 In Signal indicating to DAU to wadv initiate nextnon-CPU write Outputs to DCU dau_dcu_(—) 17 Out Signal indicating theaddress adr[21:5] for the DRAM access. This is a 256-bit aligned DRAMaddress. dau_dcu_(—) 1 Out Signal indicating the direction rwn for theDRAM access (1 = read, 0 = write). dau_dcu_(—) 1 Out Signal indicatingif access is cduwpage a CDU write page mode access (1 = CDU page mode, 0= not CDU page mode). dau_dcu_(—) 1 Out Signal indicating that a refreshrefresh command is to be issued. If asserted dau_dcu_adr, dau_dcu_rwnand dau_dcu_cduwpage are ignored. dau_dcu_(—) 256 Out 256-bit write datato DCU wdata dau_dcu_(—) 32 Out Byte encoded write data mask for wmask256-bit dau_dcu_wdata to DCU

[2485] 20.14.11.1 Command Multiplexor Sub-Block Description

[2486] The Command Multiplexor sub-block issues read, write or refreshcommands to the DCU, according to the SoPEC Unit selected for DRAMaccess by the Arbitration Logic. The Command Multiplexor signals theArbitration Logic to perform arbitration to select the next SoPEC Unitfor DRAM access. It does this by asserting the re_arbitrate signal.re_arbitrate is asserted when the DCU indicates on dcu_dau_adv that itneeds the next command.

[2487] The Command Multiplexor is shown in FIG. 115.

[2488] Initially, the issuing of commands is described. Then theadditional complexity of handling non-CPU write commands arbitrated inadvance is introduced.

[2489] DAU-DCU Interface

[2490] See Section 20.14.5 for a description of the DAU-DCU interface.

[2491] Generating re_arbitrate

[2492] The condition for asserting re_arbitrate is that the DCU islooking for another command from the DAU. This is indicated bydcu_dau_adv being asserted.

re _(—) arbitrate=dcu _(—) dau _(—) adv

[2493] Interface to SoPEC DIU Requestors

[2494] When the Command Multiplexor initiates arbitration by assertingre_arbitrate to the Arbitration Logic sub-block, the arbitration winneris indicated by the arb_sel[4:0] and dir_sel[1:0] signals returned fromthe Arbitration Logic. The validity of these signals is indicated byarb_gnt. The encoding of arb_sel[4:0] is shown in Table.

[2495] The value of arb_sel[4:0] is used to control the steeringmultiplexor to select the DIU address of the winning arbitrationrequestor. The arb_gnt signal is decoded as an acknowledge,diu_<unit>_*ack back to the winning DIU requestor. The timing of theseoperations is shown in FIG. 116. adr[21:0] is the output of the steeringmultiplexor controlled by arb_sel[4:0]. The steering multiplexor canacknowledge DIU requestors in successive cycles.

[2496] Command Issuing Logic

[2497] The address presented by the winning SoPEC requestor from thesteering multiplexor is presented to the command issuing logic togetherwith arb_sel[4:0] and dir_sel[1:0].

[2498] The command issuing logic translates the winning command into thesignals required by the DCU. adr_(—)[21:0], arb_sel[4:0] anddir_sel[1:0] comes from the steering multiplexor. dau_dcu_adr[21:5] =adr[21:5] dau_dcu_rwn = (dir_sel[1:0] = = read) dau_dcu_cduwpage =(arb_sel[4:0] = = CDU write) dau_dcu_refresh = (dir_sel[1:0] = =refresh)

[2499] dau_dcu_valid indicates that a valid command is available to theDCU.

[2500] For a write command, dau_dcu_valid will not be asserted untilthere is also valid write data present. This is indicated by the signalwrite_data_valid[1:0] from the Read Write Data Multiplexor sub-block.

[2501] For a write command, the data issued to the DCU ondau_dcu_wdata[255:0] is multiplexed from cpu_wdata[31:0] andwdata[255:0] depending on whether the write is a CPU or non-CPU write.The write data from the Write Multiplexor for the CDU is available onwdata[63:0]. This data must be issued to the DCU ondau_dcu_wdata[255:0]. wdata[63:0] is copied to each 64-bit word ofdau_dcu_wdata[255:0]. dau_dcu_wdata[255:0] = 0x00000000 if(arb_sel[4:0]= =CPU write) then dau_dcu_wdata[31:0] = cpu_wdata[31:0]elsif (arb_sel[4:0]= =CDU write)) then dau_dcu_wdata[63:0] = wdata[63:0]dau_dcu_wdata[127:64] = wdata[63:0] dau_dcu_wdata[191:128] = wdata[63:0]dau_dcu_wdata[255:192] = wdata[63:0] else dau_dcu_wdata[255:0] = wdata[255:0]

[2502] CPU Write Masking

[2503] The CPU write data bus is only 128 bits wide. cpu_diu_wmask[15:0]indicates how many bytes of that 128 bits should be written. Theassociated address cpu_diu_wadr[21:4] is a 128-bit aligned address. Theactual DRAM write must be a 256-bit access. The command multiplexorissues the 256-bit DRAM address to the DCU on dau_dcu_adr[21:5].cpu_diu_wadr[4] and cpu_diu_wmask[15:0] are used jointly to construct abyte write mask dau_dcu_wmask[31:0] for this 256-bit write access.

[2504] CDU Write Masking

[2505] The CPU performs four 64-bit word writes to 4 contiguous 256-bitDRAM addresses with the first address specified by cdu_diu_wadr[21:3].The write address cdu_diu_wadr[21:5] is 256-bit aligned with bitscdu_diu_wadr[44:3] allowing the 64-bit word to be selected. If these 4DRAM words lie in the same DRAM row then an efficient access will beobtained.

[2506] The command multiplexor logic must issue 4 successive accesses to256-bit DRAM addresses cdu_diu_wadr[21:5],+1,+2,+3.

[2507] dau_dcu_wmask[31:0] indicates which 8 bytes (64-bits) of the256-bit word are to be written.

[2508] dau_dcu_wmask[31:0] is calculated using cdu_diu_wadr[4:3] i.e.bits 8*cdu_diu_wadr[4:3] to 8*(cdu_diu_wadr[4:3]+1)−1 ofdau_dcu_wmask[31:0]are asserted.

[2509] Arbitrating Non-CPU Writes in Advance

[2510] In the case of a non-CPU write commands, the write data must betransferred from the SoPEC requester before the write can occur.Arbitration should occur early to allow for any delay for the write datato be transferred to the DRAM.

[2511]FIG. 113 indicates that write data transfer over 64-bit busseswill take a further 4 cycles after the address is transferred. Thearbitration must therefore occur 4 cycles in advance of arbitration forread accesses, FIG. 109 and FIG. 110, or for CPU writes FIG. 112.Arbitration of CDU write accesses, FIG. 114, should take place 1 cyclein advance of arbitration for read and CPU write accesses. To simplifyimplementation CDU write accesses are arbitrated 4 cycles in advance,similar to other non-CPU writes.

[2512] The Command Multiplexor generates another version of re_arbitratecalled re_arbitrate_wadv based on the signal dcu_dau_wadv from the DCU.In the 3 cycle DRAM access dcu_dau_adv and therefore re_arbitrate areasserted in the MSN2 state of the DCU state-machine. dcu_dau_wadv andtherefore re_arbitrate_wadv will therefore be asserted in the followingRST state, see FIG. 117. This matches the timing required for non-CPUwrites shown in FIG. 113 and FIG. 114.

[2513] re_arbitrate_wadv causes the Arbitration Logic to perform anarbitration for non-CPU in advance. re_arbitrate = dcu_dau_advre_arbitrate_wadv = dcu_dau_wadv

[2514] If the winner of this arbitration is a non-CPU write then arb_gntis asserted and the arbitration winner is output on arb_sel[4:0] anddir_sel[1:0]. Otherwise arb_gnt is not asserted.

[2515] Since non-CPU write commands are arbitrated early, the non-CPUcommand is not issued to the DCU immediately but instead written into anadvance command register. if (arb_sel(4:0 = = non-CPU write) thenadvance_cmd_register[3:0] = arb_sel[4:0] advance_cmd_register[5:4] =dir_sel[1:0] advance_cmd_register[27:6] = adr[21:0]

[2516] If a DCU command is in progress then the arbitration in advanceof a non-CPU write command will overwrite the steering multiplexor inputto the command issuing logic. The arbitration in advance happens in theDCU MSN1 state. The new command is available at the steering multiplexorin the MSN2 state. The command in progress will have been latched in theDRAM by MSN falling at the start of the MSN1 state.

[2517] Issuing Non-CPU Write Commands

[2518] The arb_sel[4:0] and dir_(—)[1:0] values generated by theArbitration Logic reflect the out of order arbitration sequence.

[2519] This out of order arbitration sequence is exported to the ReadWrite Data Multiplexor sub-block. This is so that write data inavailable in time for the actual write operation to DRAM. Otherwise alatency would be introduced every time a write command is selected.

[2520] However, the Command Multiplexor must execute the command streamin-order. In-order command execution is achieved by waiting untilre_arbitrate has advanced to the non-CPU write timeslot from whichre_arbitrate_wadv has previously issued a non-CPU write written to theadvance command register.

[2521] If re_arbitrate_wadv arbitrates a non-CPU write in advance thenwithin the Arbitration Logic the timeslot is marked to indicate whethera write was issued.

[2522] When re_arbitrate advances to a write timeslot in the ArbitrationLogic then one of two actions can occur depending on whether the slotwas marked by re_arbitrate_wadv to indicate whether a write was issuedor not.

[2523] Non-CPU write arbitrated by re_arbitrate_wadv

[2524] If the timeslot has been marked as having issued a write then thearbitration logic responds to re_arbitrate by issuing arb_sel[4:0],dir_sel[1:0] and asserting arb_gnt as for a normal arbitration butselecting a non-CPU write access. Normally, re_arbitrate does not issuenon-CPU write accesses. Non-CPU writes are arbitrated byre_arbitrate_wadv. dir_sel[1:0]==00 indicates a non-CPU write issued byre_arbitrate.

[2525] The command multiplexor does not write the command into theadvance command register as it has already been placed there earlier byre_arbitrate_wadv. Instead, the already present write command in theadvance command register is issued when write_data_valid[1]=1. Note,that the value of arb_sel[4:0] issued by re_arbitrate could specify adifferent write than that in the advance command register since time hasadvanced. It is always the command in the advance command register thatis issued. The steering multiplexor in this case must not issue anacknowledge back to SoPEC requester indicated by the value ofarb_sel[4:0]. if (dir_sel[1:0] = = 00) then command_issuing_logic[27:0]= = advance_cmd_register[27:0] else command_issuing_logic[27:0] = =steering_multiplexor[27:0] ack = arb_gnt AND NOT (dir_sel[1:0] = = 00)

[2526] Non-CPU write not arbitrated by re_arbitrate_wadv

[2527] If the timeslot has been marked as not having issued a write, there_arbitrate will use the un-used read timeslot selection to replace theun-used write timeslot with a read timeslot according to Section20.10.6.2 Unused read timeslots allocation.

[2528] The mechanism for write timeslot arbitration selects non-CPUwrites in advance. But the selected non-CPU write is stored in theCommand Multiplexor and issued when the write data is available. Thismeans that even if this timeslot is overwritten by the CPU reprogrammingthe timeslot before the write command is actually issued to the DRAM,the originally arbitrated non-CPU write will always be correctly issued.

[2529] Accepting Write Commands

[2530] When a write command is issued then write_data_accept[1:0] isasserted. This tells the Write Multiplexor that the current write datahas been accepted by the DRAM and the write multiplexor can receivewrite data from the next arbitration winner if it is a write.write_data_accept[1:0] differentiates between CPU and non-CPU writes. Awrite command is known to have been issued when re_arbitrate_wadv todecide on the next command is detected.

[2531] In the case of CDU writes the DCU will generate a signaldcu_dau_cduwaccept which tells the Command Multiplexor to issue awrite_data_accept[1]. This will result in the Write Multiplexorsupplying the next CDU write data to the DRAM. write_data_accept[0] =RISING EDGE(re_arbitrate_wadv) AND command_issuing_logic(dir_sel[1]= =1)AND command_issuing_logic(arb_sel[4:0]= =CPU) write_data_accept[1] =(RISING EDGE(re_arbitrate_wadv) ANDcommand_issuing_logic(dir_sel[1]= =1) ANDcommand_issuing_logic(arb_sel[4:0]= =non_CPU)) OR dcu_dau_cduwaccept= =1

[2532] Debug logic output to CPU Configuration and Arbitration Logicsub-block write_sel[4:0] reflects the value of arb_sel[4:0] at thecommand issuing logic. The signal write complete is asserted when everyany bit of write_data_accept[1:0] is asserted. write_complete =write_data_accept[0] OR write_data_accept[0]

[2533] write_sel[4:0]and write_complete are CPU readable from theDIUPerformance nad WritePerformance status registers. Whenwrite_complete is asserted write_sel[4:0] will indicate which writeaccess the DAU has issued.

[2534] 20.14.12 CPU Configuration and Arbitration Logic Sub-Block TABLE137 CPU Configuration and Arbitration Logic Sub-block IO Definition Portname Pins I/O Description Clocks and Resets Pclk 1 In System Clockprst_n 1 In System reset, synchronous active low CPU Interface data andcontrol signals cpu_adr[10:2] 9 In 9 bits (bits 10:2) are required todecode the configuration register address space. cpu_dataout 32 InShared write data bus from the CPU for DRAM and configuration datadiu_cpu_data 32 Out Configuration, status and debug read data bus to theCPU diu_cpu_(—) 1 Out Signal indicating the data on debug_valid thediu_cpu_data bus is valid debug data. cpu_rwn 1 In Common read/not-writesignal from the CPU cpu_acode 2 In CPU access code signals.cpu_acode[0] - Program (0)/ Data (1) access cpu_acode[1] - User (0)/Supervisor (1) access The DAU will only allow supervisor mode accessesto data space. cpu_diu_sel 1 In Block select from the CPU. Whencpu_diu_sel is high both cpu_adr and cpu_dataout are valid diu_cpu_rdy 1Out Ready signal to the CPU. When diu_cpu_rdy is high it indicates thelast cycle of the access. For a write cycle this means cpu_dataout hasbeen registered by the block and for a read cycle this means the data ondiu_cpu_data is valid. diu_cpu_berr 1 Out Bus error signal to the CPUindicating an invalid access. DIU Read Interface to SoPEC Units<unit>_diu_(—) 11 In SoPEC unit requests DRAM read. rreq DIU WriteInterface to SoPEC Units diu_cpu_(—) 1 In Indicator that CPU postedwrite write_rdy buffer is empty. <unit>_diu_(—) 4 In Non- CPU SoPEC unitrequests wreq DRAM write. Inputs from Command Multiplexor sub-blockre_arbitrate 1 In Signal telling the arbitration logic to choose thenext arbitration winner. re_arbitrate_(—) 1 In Signal telling thearbitration wadv logic to choose the next arbitration winner for non-CPUwrites 2 timeslots in advance Outputs to DCU dau_dcu_(—) 1 Out Signalindicating from DAU msn2stall Arbitration Logic which when assertedstalls DCU in MSN2 state. Inputs from Read and Write Multiplexorsub-block read_cmd_rdy 2 In Signal indicating that read multiplexor isready for next read read command. 00 = not ready 01 = ready for CPU read10 = ready for non-CPU read 11 = ready for both CPU and non-CPU readswrite_cmd_rdy 2 In Signal indicating that write multiplexor is ready fornext write command. 00 = not ready 01 = ready for CPU write 10 = readyfor non-CPU write 11 = ready for both CPU and non-CPU write Outputs toother DAU sub-blocks arb_gnt 1 In Signal lasting 1 cycle which indicatesarbitration has occurred and arb_sel is valid. arb_sel 5 In Signalindicating which requesting SoPEC Unit has won arbitration. Encoding isdescribed in Table . dir_sel 2 In Signal indicating which sense ofaccess associated with arb_sel 00: issue non-CPU write 01: read winner10: write winner 11: refresh winner Debug Inputs from Read-WriteMultiplexor sub-block read_sel 5 In Signal indicating the SoPEC Unit forwhich the current read trans- action is occurring. Encoding is describedin Table . read_complete 1 In Signal indicating that read transaction toSoPEC Unit indicated by read_sel is complete. Debug Inputs from CommandMultiplexor sub-block write_sel 5 In Signal indicating the SoPEC Unitfor which the current write transaction is occurring. Encoding isdescribed in Table . write_complete 1 In Signal indicating that writetransaction to SoPEC Unit indicated by write_sel is complete. DebugInputs from DCU dcu_dau_(—) 1 In Signal indicating that the DCUrefreshcomplete has completed a refresh. Debug Inputs from DAU IOvarious n In Various DAU IO signals which can be monitored in debug mode

[2535] The CPU Interface and Arbitration Logic sub-block is shown inFIG. 118.

[2536] 20.14.12.1 CPU Interface and Configuration Registers Description

[2537] The CPU Interface and Configuration Registers sub-block providesfor the CPU to access DAU specific registers by reading or writing tothe DAU address space.

[2538] The CPU subsystem bus interface is described in more detail inSection 11.4.3. The DAU block will only allow supervisor mode accessesto data space (i.e. cpu_acode[1:0]=b11). All other accesses will resultin diu_cpu_berr being asserted.

[2539] The configuration registers described in Section 20.14.9 TABLE130 DAU configuration registers Address # (DIU_base +) Register bitsReset Description Reset 0x00 Reset 1 0x1 A write to this register causesa reset of the DIU. This register can be read to indicate the resetstate: 0 - reset in progress 1 - reset not in progress Refresh 0x04RefreshPeriod 9 0x063 Refresh controller. When set to 0 refresh is off,otherwise the value indicates the number of cycles, less one, betweeneach refresh. [Note that for a system clock fre- quency of 160 MHz, avalue exceeding 0x63 (indicating a 100-cycle refresh period) should notbe programmed, or the DRAM will malfunction.] Timeslot allocation andcontrol 0x08 NumMainTimeslots 6 0x01 Number of main timeslots (1-64)less one 0x0C CPUPreAccessTimes 4 0x0 (CPUPreAccessTimeslots + 1) lotsmain slots out of a total of (CPUTotalTimeslots + 1) are preceded by aCPU access. 0x10 CPUTotalTimeslots 4 0x0 (CPUPreAccessTimeslots + 1)main slots out of a total of (CPUTotalTimeslots + 1) are preceded by aCPU access. 0x100-0x1FC MainTimeslot[63:0] 64 × 4 [63:1][3:0] = 0x0Programmable main timeslots [0][3:0] = 0xE (up to 64 main timeslots).0x200 ReadRoundRobinLevel 12 0x000 For each read requester plus refresh0 = level1 of round-robin 1 = level2 of round-robin The bit order isdefined in Table . 0x204 EnableCPURoundRobin 1 0x1 Allows the CPU toparticpate in the unused read round-robin scheme. If disabled, theshared CPU/refresh round-robin position is dedicated solely to refresh.0x208 RotationSync 1 0x1 Writing 0, followed by 1 to this bit allows thetimeslot rotation to advance on a cycle basis which can be determined bythe CPU. 0x20C minNonCPUReadAdr 12 0x800 12 MSBs of lowest DRAM addresswhich may be read by non-CPU requesters. 0x210 minDWUWriteAdr 12 0x80012 MSBs of lowest DRAM address which may be written to by the DWU. 0x214minNonCPUWriteAdr 12 0x800 12 MSBs of lowest DRAM address which may bewritten to by non-CPU requesters other than the DWU. Debug 0x300DebugSelect[11:2] 10 0x304 Debug address select. Indicates the addressof the register to report on the diu_cpu_data bus when it is nototherwise being used. When this signal carries debug information thesignal diu_cpu_debug_valid will be asserted. Debug: arbitration andperformance 0x304 ArbitrationHistory 22 — Bit 0 = arb_gnt Bit 1 =arb_executed Bit 6:2 = arb_sel[4:0] Bit 12:7 = timeslot_number[5:0] Bit15:13 = access_type[2:0] Bit 16 = back2back_non_cpu_write Bit 17 =sticky_back2back_non_cpu_write (Sticky version of same, cleared onreset.) Bit 18 = rotation_sync Bit 20:19 = rotation_state Bit 21 =sticky_invalid_non_cpu_adr See Section 20.14.9.2 DIU Debug for adescription of the fields. Read only register. 0x308 DIUPerformance 31 —Bit 0 = cpu_diu_rreq Bit 1 = scb_diu_rreq Bit 2 = cdu_diu_rreq Bit 3 =cfu_diu_rreq Bit 4 = lbd_diu_rreq Bit 5 = sfu_diu_rreq Bit 6 =td_diu_rreq Bit 7 = tfs_diu_rreq Bit 8 = hcu_diu_rreq Bit 9 =dnc_diu_rreq Bit 10 = llu_diu_rreq Bit 11 = pcu_diu_rreq Bit 12 =cpu_diu_wreq Bit 13 = scb_diu_wreq Bit 14 = cdu_diu_wreq Bit 15 =sfu_diu_wreq Bit 16 = dwu_diu_wreq Bit 17 = refresh_req Bit 22:18 =read_sel[4:0] Bit 23 = read_complete Bit 28:24 = write_sel[4:0] Bit 29 =write_complete Bit 30 = dcu_dau_refreshcomplete See Section 20.14.9.2DIU Debug for a description of the fields. Read only register. Debug DIUread requesters interface signals 0x30C CPUReadInterface 25 — Bit 0 =cpu_diu_rreq Bit 22:1 = cpu_adr[21:0] Bit 23 = diu_cpu_rack Bit 24 =diu_cpu_rvalid Read only register. 0x310 SCBReadInterface 20 Bit 0 =scb_diu_rreq Bit 17:1 = scb_diu_radr[21:5] Bit 18 = diu_scb_rack Bit 19= diu_scb_rvalid Read only register. 0x314 CDUReadInterface 20 — Bit 0 =cdu_diu_rreq Bit 17:1 = cdu_diu_radr[21:5] Bit 18 = diu_cdu_rack Bit 19= diu_cdu_rvalid Read only register. 0x318 CFUReadInterface 20 — Bit 0 =cfu_diu_rreq Bit 17:1 = cfu_diu_radr[21:5] Bit 18 = diu_cfu_rack Bit 19= diu_cfu_rvalid Read only register. 0x31C LBDReadInterface 20 — Bit 0 =lbd_diu_rreq Bit 17:1 = lbd_diu_radr[21:5] Bit 18 = diu_lbd_rack Bit 19= diu_lbd_rvalid Read only register. 0x320 SFUReadInterface 20 — Bit 0 =sfu_diu_rreq Bit 17:1 = sfu_diu_radr[21:5] Bit 18 = diu_sfu_rack Bit 19= diu_sfu_rvalid Read only register. 0x324 TDReadInterface 20 — Bit 0 =td_diu_rreq Bit 17:1 = td_diu_radr[21:5] Bit 18 = diu_td_rack Bit 19 =diu_td_rvalid Read only register. 0x328 TFSReadInterface 20 — Bit 0 =tfs_diu_rreq Bit 17:1 = tfs_diu_radr[21:5] Bit 18 = diu_tfs_rack Bit 19= diu_tfs_rvalid Read only register. 0x32C HCUReadInterface 20 — Bit 0 =hcu_diu_rreq Bit 17:1 = hcu_diu_radr[21:5] Bit 18 = diu_hcu_rack Bit 19= diu_hcu_rvalid Read only register. 0x330 DNCReadInterface 20 — Bit 0 =dnc_diu_rreq Bit 17:1 = dnc_diu_radr[21:5] Bit 18 = diu_dnc_rack Bit 19= diu_dnc_rvalid Read only register. 0x334 LLUReadInterface 20 — Bit 0 =llu_diu_rreq Bit 17:1 = lluu_diu_radr[21:5] Bit 18 = diu_llu_rack Bit 19= diu_llu_rvalid Read only register. 0x338 PCUReadInterface 20 — Bit 0 =pcu_diu_rreq Bit 17:1 = pcu_diu_radr[21:5] Bit 18 = diu_pcu_rack Bit 19= diu_pcu_rvalid Read only register. Debug DIU write requestersinterface signals 0x33C CPUWriteInterface 27 — Bit 0 = cpu_diu_wreq Bit22:1 = cpu_adr[2.1:0] Bit 24:23 = cpu_diu_wmask[1:0] Bit 25 =diu_cpu_wack Bit 26 = cpu_diu_wvalid Read only register. 0x340SCBWriteInterface 20 — Bit 0 = scb_diu_wreq Bit 17:1 =scb_diu_wadr[21:5] Bit 18 = diu_scb_wack Bit 19 = scb_diu_wvalid Readonly register. 0x344 CDUWriteInterface 22 — Bit 0 = cdu_diu_wreq Bit19:1 = cdu_diu_wadr[21:3] Bit 20 = diu_cdu_wack Bit 21 = cdu_diu_wvalidRead only register. 0x348 SFUWriteInterface 20 — Bit 0 = sfu_diu_wreqBit 17:1 = sfu_diu_wadr[21:5] Bit 18 = diu_sfu_wack Bit 19 =sfu_diu_wvalid Read only register. 0x34C DWUWriteInterface 20 — Bit 0 =dwu_diu_wreq Bit 17:1 = dwu_diu_wadr[21:5] Bit 18 = diu_dwu_wack Bit 19= dwu_diu_wvalid Read only register. Debug DAU-DCU interface signals0x350 DAU-DCUInterface 25 — Bit 16:0 = dau_dcu_adr[21:5] Bit 17 =dau_dcu_rwn Bit 18 = dau_dcu_cduwpage Bit 19 = dau_dcu_refresh Bit 20 =dau_dcu_msn2stall Bit 21 = dcu_dau_adv Bit 22 = dcu_dau_wadv Bit 23 =dcu_dau_refreshcomplete Bit 24 = dcu_dau_rvalid Read only register.

[2540] are implemented here.

[2541] 20.14.12.2 Arbitration Logic Description

[2542] Arbitration is triggered by the signal re_arbitrate from theCommand Multiplexor sub-block with the signal arb_gnt indicating thatarbitration has occurred and the arbitration winner is indicated byarb_sel[4:0]. The encoding of arb_sel[4:0] is shown in Table. The signaldir_sel[1:0] indicates if the arbitration winner is a read, write orrefresh. Arbitration should complete within one clock cycle so arb_gntis normally asserted the clock cycle after re_arbitrate and stays highfor 1 clock cycle. arb_sel[4:0] and dir_sel[1:0] remain persistent untilarbitration occurs again. The arbitration timing is shown in FIG. 119.

[2543] 20.14.12.2.1 Rotation Synchronisation

[2544] A configuration bit, RotationSync, is used to initialiseadvancement through the timeslot rotation, in order that the CPU willknow, on a cycle basis, which timeslot is being arbitrated. This isessential for debug purposes, so that exact arbitration sequences can bereproduced.

[2545] In general, if RotationSync is set, slots continue to bearbitrated in the regular order specified by the timeslot rotation. Whenthe bit is cleared, the current rotation continues until the slotpointers for pre- and main arbitration reach zero. The arbitration logicthen grants DRAM access exclusively to the CPU and refreshes.

[2546] When the CPU again writes to RotationSync to cause a 0-to-1transition of the bit, the rdy acknowledgment back to the CPU for thiswrite will be exactly coincident with the RST cycle of the initialrefresh which heralds the enabling of a new rotation. This refresh,along with the second access which can be either a CPU pre-access or arefresh, (depending on the CPU's request inputs), form a 2-access“preamble” before the first non-CPU requester in the new rotation can beserviced. This preamble is necessary to give the write pre-arbitrationthe necessary head start on the main arbitration, so that write data canbe loaded in time. See FIG. 105 below. The same preamble procedure isfollowed when emerging from reset.

[2547] The alignment of rdy with the commencement of the rotationensures that the CPU is always able to calculate at any point how far arotation has progressed. RotationSync has a reset value of 1 to ensurethat the default power-up rotation can take place.

[2548] Note that any CPU writes to the DIU's other configurationregisters should only be made when RotationSync is cleared. This ensuresthat accesses by non-CPU requesters to DRAM are not affected by partialconfiguration updates which have yet to be completed.

[2549] 20.14.12.2.2 Motivation for Rotation Synchronisation

[2550] The motivation for this feature is that communications with SoPECfrom external sources are synchronised to the internal clock of ourposition within a DIU full timeslot rotation. This means that if anexternal source told SoPEC to start a print 3 separate times, it wouldlikely be at three different points within a full DIU rotation. Thisdifference means that the DIU arbitration for each of the runs would bedifferent, which would manifest itself externally as anomalous orinconsistent print performance. The lack of reproducibility is theproblem here.

[2551] However, if in response to the external source saying to startthe print, we caused the internal to pass through a known state at afixed time offset to other internal actions, this would result inreproducible prints. So, the plan is that the software would do arotation synchronise action, then writes “Go” into various PEP units tocause the prints. This means the DIU state will be the identical withrespect to the PEP units state between separate runs.

[2552] 20.14.12.2.3 Wind-Down Protocol When Rotation Synchronisation isInitiated

[2553] When a zero is written to “RotationSync”, this initiates a“wind-down protocol” in the DIU, in which any rotation already begunmust be fully completed. The protocol implements the followingsequence:—

[2554] The pre-arbitration logic must reach the end of whatever rotationit is on and stop pre-arbitrating.

[2555] Only when this has happened, does the main arbitration considerdoing likewise with its current rotation. Note that the main arbitrationlags the pre-arbitration by at least 2 DRAM accesses, subject tovariation by CPU pre-accesses and/or scheduled refreshes, so that thetwo arbitration processes are sometimes on different rotations.

[2556] Once the main arbitration has reached the end of its rotation,rotation synchronisation is considered to be fully activated.Arbitration then proceeds as outlined in the next section.

[2557] 20.14.12.2.4 Arbitration During Rotation Synchronisation

[2558] Note that when RotationSync is ‘0’ and, assuming the terminatingrotation has completely drained out, then DRAM arbitration is grantedaccording to the following fixed priority order:—ScheduledRefresh->CPU(W)->CPU(R)->Default Refresh.

[2559] CPU pre-access counters play no part in arbitration during thisperiod. It is only subsequently, when emerging from rotation sync, thatthey are reloaded with the values of CPUPreAccessTimeslots andCPUTotalTimeslots and normal service resumes.

[2560] 20.14.12.2.5 Timeslot-Based Arbitration

[2561] Timeslot-based arbitration works by having a pointer point to thecurrent timeslot. This is shown in FIG. 95 repeated here as FIG. 121.When re-arbitration is signaled the arbitration winner is the currenttimeslot and the pointer advances to the next timeslot. Each timeslotdenotes a single access. The duration of the timeslot depends on theaccess.

[2562] If the SoPEC Unit assigned to the current timeslot is notrequesting then the unused timeslot arbitration mechanism outlined inSection 20.10.6 is used to select the arbitration winner. Note that thisunused slot re-allocation is guaranteed to produce a result, because ofthe inclusion of refresh in the round-robin scheme.

[2563] Pseudo-code to represent arbitration is given below: ifre_arbitrate = = 1 then arb_gnt = 1 if current timeslot requesting thenchoose(arb_sel,   dir_sel)   at   current timeslot else // un-usedtimeslot scheme choose  winner  according  to  un-used timeslotallocation of Section 20.10.6 arb_gnt = 0

[2564] 20.14.12.3 Arbitrating Non-CPU Writes in Advance

[2565] In the case of a non-CPU write commands, the write data must betransferred from the SoPEC requester before the write can occur.Arbitration should occur early to allow for any delay for the write datato be transferred to the DRAM.

[2566]FIG. 113 indicates that write data transfer over 64-bit busseswill take a further 4 cycles after the address is transferred. Thearbitration must therefore occur 4 cycles in advance of arbitration forread accesses, FIG. 109 and FIG. 110, or for CPU writes FIG. 112.Arbitration of CDU write accesses, FIG. 114, should take place 1 cyclein advance of arbitration for read and CPU write accesses. To simplifyimplementation CDU write accesses are arbitrated 4 cycles in advance,similar to other non-CPU writes.

[2567] The Command Multiplexor generates a second arbitration signalre_arbitrate_wadv which initiates the arbitration in advance of non-CPUwrite accesses.

[2568] The timeslot scheme is then modified to have 2 separate pointers:

[2569] re_arbitrate can arbitrate read, refresh and CPU read and writeaccesses according to the position of the current timeslot pointer.

[2570] re_arbitrate_wadv can arbitrate only non-CPU write accessesaccording to the position of the write lookahead pointer.

[2571] Pseudo-code to represent arbitration is given below://re_arbitrate if (re arbitrate = = 1) AND (current timeslot pointer!=non- CPU write) then arb_gnt = 1 if current timeslot requesting thenchoose(arb_sel, dir_sel) at current timeslot else // un-used readtimeslot scheme choose winner according to un-used read timeslotallocation of Section 20.10.6.2

[2572] If the SoPEC Unit assigned to the current timeslot is notrequesting then the unused read timeslot arbitration mechanism outlinedin Section 20.10.6.2 is used to select the arbitration winner.//re_arbitrate_wadv if (re_arbitrate_wadv = = 1) AND (write lookaheadtimeslot pointer = = non-CPU write)   then if write lookahead timeslotrequesting then choose(arb_sel, dir_sel) at write lookahead timeslotarb_gnt = 1 elsif un-used write timeslot scheme has a requestorchoose winner according to un-used write timeslot allocation of Section20.10.6.1 arb_gnt = 1 else //no arbitration winner arb_gnt = 0

[2573] re_arbitrate is generated in the MSN2 state of the DCUstate-machine, whereas

[2574] re_arbitrate_wadv is generated in the RST state. See FIG. 103.

[2575] The write lookahead pointer points two timeslots in advance ofthe current timeslot pointer. Therefore re_arbitrate_wadv causes theArbitration Logic to perform an arbitration for non-CPU two timeslots inadvance. As noted in Table, each timeslot lasts at least 3 cycles.Therefor re_arbitrate_wadv arbitrates at least 4 cycles in advance.

[2576] At initialisation, the write lookahead pointer points to thefirst timeslot. The current timeslot pointer is invalid until the writelookahead pointer advances to the third timeslot when the currenttimeslot pointer will point to the first timeslot. Then both pointersadvance in tandem.

[2577] Some accesses can be preceded by a CPU access as in Table. TheseCPU accesses are not allocated timeslots. If this is the case thetimeslot will last 3 (CPU access)+3 (non-CPU access)=6 cycles. In thatcase, a second write lookahead pointer, the CPU pre-access writelookahead pointer, is selected which points only one timeslot inadvance. re_arbitrate_wadv will still arbitrate 4 cycles in advance.

[2578] 20.14.12.3.1 Issuing Non-CPU Write Commands

[2579] Although the Arbitration Logic will arbitrate non-CPU writes inadvance, the Command Multiplexor must issue all accesses in the timeslotorder. This is achieved as follows:

[2580] If re_arbitrate_wadv arbitrates a non-CPU write in advance thenwithin the Arbitration Logic the timeslot is marked to indicate whethera write was issued. //re_arbitrate_wadv if (re_arbitrate_wadv = =1) AND (write lookahead timeslot pointer = = non-CPU write)   then ifwrite lookahead timeslot requesting then choose(arb_sel, dir_sel) atwrite lookahead timeslot arb_gnt = 1 MARK_timeslot = 1 elsif un-usedwrite timeslot scheme has a requestorchoose winner according to un-used write timeslot allocation of Section20.10.6.1 arb_gnt = 1 MARK_timeslot = 1 else //no pre-arbitration winnerarb_gnt = 0 MARK_timeslot = 0

[2581] When re_arbitrate advances to a write timeslot in the ArbitrationLogic then one of two actions can occur depending on whether the slotwas marked by re_arbitrate_wadv to indicate whether a write was issuedor not.

[2582] Non-CPU write arbitrated by re_arbitrate_wadv

[2583] If the timeslot has been marked as having issued a write then thearbitration logic responds to re_arbitrate by issuing arb_sel[4:0],dir_sel[1:0] and asserting arb gnt as for a normal arbitration butselecting a non-CPU write access. Normally, re_arbitrate does not issuenon-CPU write accesses. Non-CPU writes are arbitrated byre_arbitrate_wadv. dir_sel[1:0]==00 indicates a non-CPU write issued byre_arbitrate.

[2584] Non-CPU write not arbitrated by re_arbitrate_wadv

[2585] If the timeslot has been marked as not having issued a write, there_arbitrate will use the un-used read timeslot selection to replace theun-used write timeslot with a read timeslot according to Section20.10.6.2 Unused read timeslots allocation. //re_arbitrate except fornon-CPU writes if (re_arbitrate = = 1) AND (current timeslot pointer!=non- CPU write) then arb_gnt = 1 if current timeslot requesting thenchoose(arb_sel, dir_sel) at current timeslot else // un-used readtimeslot scheme choose winner according to un-used read timeslotallocation of Section 20.10.6.2 arb_gnt = 1 //non-CPU write MARKED asissued elsif (re arbitrate = = 1) AND (current timeslot pointer = =non-CPU write) AND (MARK_timeslot = = 1) then //indicate to CommandMultiplexor that non-CPU write has been arbitrated in //advance arb_gnt= 1 dir_sel[1:0] = = 00 //non-CPU write not MARKED as issued elsif(re_arbitrate = = 1) AND (current timeslot pointer = = non-CPU write)AND (MARK_timeslot = = 0) thenchoose winner according to un-used read timeslot allocation of Section20.10.6.2 arb_gnt = 1

[2586] 20.14.12.4 Flow Control

[2587] If read commands are to win arbitration, the Read Multiplexormust be ready to accept the read data from the DRAM. This is indicatedby the read_cmd_rdy[1:0] signal. read_cmd_rdy[1:0] supplies flow controlfrom the Read Multiplexor.read_cmd_rdy[0]= =1 //Read multiplexor ready for CPU readread_cmd_rdy[1]= =1 //Read multiplexor ready for non-CPU read

[2588] The Read Multiplexor will normally always accept CPU reads, seeSection 20.14.13.1, so read_cmd_rdy[0]==1 should always apply.

[2589] Similarly, if write commands are to win arbitration, the WriteMultiplexor must be ready to accept the write data from the winningSoPEC requestor. This is indicated by the write_cmd_rdy[1:0] signal.write_cmd_rdy[1:0] supplies flow control from the Write Multiplexor.write_cmd_rdy[0]= =1 //Write multiplexor ready for CPU write write cmdrdy [1]= =1 //Write multiplexor ready for non- CPU write

[2590] The Write Multiplexor will normally always accept CPU writes, seeSection 20.14.13.2, so write_cmd_rdy[0]==1 should always apply.

[2591] Non-CPU Read Flow Control

[2592] If re_arbitrate selects an access then the signaldau_dcu_msn2stall is asserted until the Read Write Multiplexor is ready.

[2593] arb_gnt is not asserted until the Read Write Multiplexor isready.

[2594] This mechanism will stall the DCU access to the DRAM until theRead Write Multiplexor is ready to accept the next data from the DRAM inthe case of a read. //other access flow control dau_dcu_msn2stall =(((re_arbitrate selects CPU read) AND read_cmd_rdy[0]= =0) OR (rearbitrate selects non-CPU read) AND read_cmd_rdy[1]= =0)) arb_gnt notasserted until dau_dcu_msn2stall de-asserts

[2595] 20.14.12.5 Arbitration Hierarchy

[2596] CPU and refresh are not included in the timeslot allocationsdefined in the DAU configuration registers of Table.

[2597] The hierarchy of arbitration under normal operation is

[2598] a. CPU access

[2599] b. Refresh access

[2600] c. Timeslot access.

[2601] This is shown in FIG. 124. The first DRAM access issued afterreset must be a refresh.

[2602] As shown in FIG. 118, the DIU request signals <unit>_diu_rreq,<unit>_diu_wreq are registered at the input of the arbitration block toease timing. The exceptions are the refresh_req signal, which isgenerated locally in the sub-block and cpu_diu_rreq. The CPU readrequest signal is not registered so as to keep CPU DIU read accesslatency to a minimum. Since CPU writes are posted, cpu_diu_wreq isregistered so that the DAU can process the write at a later juncture.The arbitration logic is coded to perform arbitration of non-CPUrequests first and then to gate the result with the CPU requests. Inthis way the CPU can make the requests available late in the arbitrationcycle.

[2603] Note that when RotationSync is set to ‘0’, a modified hierarchyof arbitration is used. This is outlined in section 20.14.12.2.3 on page280.

[2604] 20.14.12.6 Timeslot Access

[2605] The basic timeslot arbitration is based on the MainTimeslotconfiguration registers. Arbitration works by the timeslot pointed to byeither the current or write lookahead pointer winning arbitration. Thepointers then advance to the next timeslot. This was shown in FIG. 90.

[2606] Each main timeslot pointer gets advanced each time it is accessedregardless of whether the slot is used.

[2607] 20.14.12.7 Unused Timeslot Allocation

[2608] If an assigned slot is not used (because its corresponding SoPECUnit is not requesting) then it is reassigned according to the schemedescribed in Section 20.10.6.

[2609] Only used non-CPU accesses are reallocated. CDU write accessescannot be included in the unused timeslot allocation for write as CDUaccesses take 6 cycles. The write accesses which the CDU write couldotherwise replace require only 3 or 4 cycles.

[2610] Unused write accesses are re-allocated according to the fixedpriority scheme of Table. Unused read timeslots are re-allocatedaccording to the two-level round-robin scheme described in Section20.10.6.2.

[2611] A pointer points to the most recently re-allocated unit in eachof the round-robin levels. If the unit immediately succedling thepointer is requesting, then this unit wins the arbitration and thepointer is advanced to reflect the new winner. If this is not the case,then the subsequent units (wrapping back eventually to the pointed unit)in the level 1 round-robin are examined. When a requesting unit is foundthis unit wins the arbitration and the pointer is adjusted. If no unitis requesting then the pointer does not advance and the second level ofround-robin is examined in a similar fashion. In the followingpseudo-code the bit indices are for the ReadRoundRobinLevelconfiguration register described in Table. //choose the winningarbitration level level1 = 0 level2 = 0 for i = 0 to 11 if unit(i)requesting AND ReadRoundRobinLevel(i) = 0 then level1 = 1 if unit(i)requesting AND ReadRoundRobinLevel(i) = 1 then level2 = 1

[2612] Round-robin arbitration is effectively a priority assignment withthe units assigned a priority according to the round-robin order ofTable but starting at the unit currently pointed to. //levelptr ispointer of selected round robin level priority is array 0 to 11 // index0 is SCBR(0) etc. from Table//assign decreasing priorities from the current pointer; maximumpriority is 11 for i = 1 to 12 priority (levelptr + i) = 12 − i i++

[2613] The arbitration winner is the one with the highest priorityprovided it is requesting and its ReadRoundRobinLevel bit points to thechosen level. The levelptr is advanced to the arbitration winner.

[2614] The priority comparison can be done in the hierarchical mannershown in FIG. 125.

[2615] 20.14.12.8 How Non-CPU Address Restrictions Affect Arbitration

[2616] Recall from Table “DAU configuration registers,” on page 288,“DAU configuration registers,” on page 268 that there are minimum validDRAM addresses for non-CPU accesses, defined by minNonCPUReadAdr,minDWUWriteAdr and minNonCPUWriteAdr. Similarly, a non-CPU requester maynot try to access a location above the high memory mark.

[2617] To ensure compliance with these address restrictions, thefollowing DIU response occurs for any incorrectly addressed non-CPUwrites:—

[2618] Issue a write acknowledgment at pre-arbitration time, to preventthe write requester from hanging.

[2619] Disregard the incoming write data and write valids and void thepre-arbitration.

[2620] Subsequently re-allocate the write slot at main arbitration timevia the round robin.

[2621] For any incorrectly addressed non-CPU reads, the response is:—

[2622] Arbitrate the slot in favour of the scheduled, misbehavingrequester.

[2623] Issue the read acknowledgement and rvalids to keep the requesterfrom hanging.

[2624] Intercept the read data coming from the DCU and send back allzeros instead.

[2625] If an invalidly addressed non-CPU access is attempted, then asticky bit, sticky_invalid_non_cpu_adr, is set in the ArbitrationHistoryconfiguration register. See Table n page 293 on page 275 for details.

[2626] 20.14.12.9 Refresh Controller Description

[2627] The refresh controller implements the functionality described indetail in Section 20.10.5. Refresh is not included in the timeslotallocations.

[2628] CPU and refresh have priority over other accesses. If the refreshcontroller is requesting i.e. refresh_req is asserted, then the refreshrequest will win any arbitration initiated by re_arbitrate. When therefresh has won the arbitration refresh_req is de-asserted.

[2629] The refresh counter is reset to RefreshPeriod[8:0] i.e. thenumber of cycles between each refresh. Every time this counterdecrements to 0, a refresh is issued by asserting refresh_req. Thecounter immediately reloads with the value in RefreshPeriod[8:0] andcontinues its countdown. It does not wait for an acknowledgment, sincethe priority of a refresh request supersedes that of any pending non-CPUaccess and it will be serviced immediately. In this way, a refreshrequest is guaranteed to occur every (RefreshPeriod[8:0]+1) cycles. Agiven refresh request may incur some incidental delay in being serviced,due to alignment with DRAM accesses and the possibility of ahigher-priority CPU pre-access.

[2630] Refresh is also included in the unused read and write timeslotallocation, having second option on awards to a round-robin positionshared with the CPU. A refresh issued as a result of an unused timeslotallocation also causes the refresh counter to reload with the value inRefreshPeriod[8:0]. The first access issued by the DAU after reset mustbe a refresh. This assures that refreshes for all DRAM words fall withinthe required 3.2 ms window. //issue a refresh request if counter reaches0 or at reset or for re-allocated slotif RefreshPeriod != 0 AND (refresh_cnt = = 0 OR diu_soft_reset_n = = 0OR prst_n = =0 OR unused_timeslot_allocation = = 1) then refresh_req = 1//de-assert refresh request when refresh acked else if refresh_ack = = 1then refresh_req = 0 //refresh counter if refresh_cnt = = 0 ORdiu_soft_reset_n = = 0 OR prst_n = =0 OR unused_timeslot_allocation = =1 then refresh_cnt = RefreshPeriod else refresh_cnt = refresh_cnt − 1

[2631] Refresh can preceded by a CPU access in the same way as any otheraccess. This is controlled by the CPUPreAccessTimeslots andCPUTotalTimeslots configuration registers. Refresh will therefore notaffect CPU performance. A sequence of accesses including refresh mighttherefore be CPU, refresh, CPU, actual timeslot.

[2632] 20.14.12.10 CPU Timeslot Controller Description

[2633] CPU accesses have priority over all other accesses. CPU access isnot included in the timeslot allocations. CPU access is controlled bythe CPUPreAccessTimeslots and CPUTotalTimeslots configuration registers.

[2634] To avoid the CPU having to wait for its next timeslot it isdesirable to have a mechanism for ensuring that the CPU always gets thenext available timeslot without incurring any latency on the non-CPUtimeslots.

[2635] This is be done by defining each timeslot as consisting of a CPUaccess preceding a non-CPU access. Two counters of 4-bits each aredefined allowing the CPU to get a maximum of (CPUPreAccessTimeslots+1)pre-accesses out of a total of (CPUTotalTimeslots+1) main slots. Atimeslot counter starts at CPUTotalTimeslots and decrements everytimeslot, while another counter starts at CPUPreAccessTimeslots anddecrements every timeslot in which the CPU uses its access. If thepre-access entitlement is used up before (CPUTotalTimeslots+1) slots, nofurther CPU accesses are allowed. When the CPUTotalTimeslots counterreaches zero both counters are reset to their respective initial values.

[2636] When CPUPreAccessTimeslots is set to zero then only onepre-access will occur during every (CPUTotalTimeslots+1) slots.

[2637] 20.14.12.10.1 Conserving CPU Pre-Accesses

[2638] In section 20.10.6.2.1 on page 249, it is described how the CPUcan be allowed participate in the unused read round-robin scheme. Whenenabled by the configuration bit EnableCPURoundRobin, the CPU shares ajoint position in the round robin with refresh. In this case, the CPUhas priority, ahead of refresh, in availing of any unused slot awardedto this position.

[2639] Such CPU round-robin accesses do not count towards depleting theCPU's quota of pre-accesses, specified by CPUPreAccessTimeslots. Notethat in order to conserve these pre-accesses, the arbitration logic,when faced with the choice of servicing a CPU request either by apre-access or by an immediately following unused read slot which the CPUis poised to win, will opt for the latter.

[2640] 20.14.13 Read and Write Data Multiplexor Sub-Block TABLE 138 Readand Write Multiplexor Sub-block IO Definition Port name Pins I/ODescription Clocks and Resets Pclk 1 In System Clock prst_n 1 In Systemreset, synchronous active low DIU Read Interface to SoPEC Units diu_data64 Out Data from DIU to SoPEC Units except CPU. First 64-bits is bits63:0 of 256 bit word Second 64-bits is bits 127:64 of 256 bit word Third64-bits is bits 191:128 of 256 bit word Fourth 64-bits is bits 255:192of 256 bit word dram_cpu_(—) 256 Out 256-bit data from DRAM to CPU. datadiu_<unit>_(—) 1 Out Signal from DIU telling SoPEC Unit rvalid thatvalid read data is on the diu_data bus DIU Write Interface to SoPECUnits <unit>_diu_(—) 64 In Data from SoPEC Unit to DIU except data CPU.First 64-bits is bits 63:0 of 256 bit word Second 64-bits is bits 127:64of 256 bit word Third 64-bits is bits 191:128 of 256 bit word Fourth64-bits is bits 255:192 of 256 bit word cpu_diu_(—) 128 In Write datafrom CPU to DIU. wdatat <unit>_diu_(—) 1 In Signal from SoPEC Unitindicating wvalid that data on <unit>_diu_data is valid. Note that“unit” refers to non-CPU requesters only. cpu_diu_(—) 1 In Write enablefor the CPU posted wdatavalid write buffer. Also confirms the validityof cpu_diu_wdata. diu_cpu_(—) 1 Out Indicator that the CPU posted writewrite_rdy buffer is empty. Inputs from CPU Configuration and ArbitrationLogic Sub-block arb_gnt 1 In Signal lasting 1 cycle which indi- catesarbitration has occurred and arb_sel is valid. arb_sel 5 In Signalindicating which requesting SoPEC Unit has won arbitration. Encoding isdescribed in Table . dir_sel 2 In Signal indicating which sense ofaccess associated with arb_sel 00: issue non-CPU write 01: read winner10: write winner 11: refresh winner Outputs to Command MultiplexorSub-block write_data_(—) 2 Out Signal indicating that valid write validdata is available for the current command. 00 = not valid 01 = CPU writedata valid 10 = non-CPU write data valid 11 = both CPU and non-CPU writedata valid wdata 256 Out 256-bit non-CPU write data cpu_wdata 32 Out32-bit CPU write data Inputs from Command Multiplexor Sub-blockwrite_data_(—) 2 In Signal indicating the Command accept Multiplexor hasaccepted the write data from the write multiplexor 00 = not valid 01 =accepts CPU write data 10 = accepts non-CPU write data 11 = not validInputs from DCU dcu_dau_(—) 256 In 256-bit read data from DCU. rdatadcu_dau_(—) 1 In Signal indicating valid read data rvalid ondcu_dau_rdata. Outputs to CPU Configuration and Arbitration LogicSub-block read_cmd_(—) 2 Out Signal indicating that read multi- rdyplexor is ready for next read read command. 00 = not ready 01 = readyfor CPU read 10 = ready for non-CPU read 11 = ready for both CPU andnon-CPU reads write_cmd_(—) 2 Out Signal indicating that write rdymultiplexor is ready for next write command. 00 = not ready 01 = readyfor CPU write 10 = ready for non-CPU write 11 = ready for both CPU andnon-CPU writes Debug Outputs to CPU Configuration and Arbitration LogicSub-block read_sel 5 Out Signal indicating the SoPEC Unit for which thecurrent read trans- action is occurring. Encoding is described in Table. read_(—) 1 Out Signal indicating that read trans- complete action toSoPEC Unit indicated by read_sel is complete.

[2641] 20.14.13.1 Read Multiplexor Logic Description

[2642] The Read Multiplexor has 2 read channels

[2643] a separate read bus for the CPU, dram_cpu_data[255:0].

[2644] and a shared read bus for the rest of SoPEC, diu_data[63:0].

[2645] The validity of data on the data busses is indicated by signalsdiu_<unit>_rvalid.

[2646] Timing waveforms for non-CPU and CPU DIU read accesses are shownin FIG. 90 and FIG. 91, respectively.

[2647] The Read Multiplexor timing is shown in FIG. 127. FIG. 127 showsboth CPU and non-CPU reads. Both CPU and non-CPU channels areindependent i.e. data can be output on the CPU read bus while non-CPUdata is being transmitted in 4 cycles over the shared 64-bit read bus.CPU read data, dram_cpu_data[255.0], is available in the same cycle asoutput from the DCU. CPU read data needs to be registered immediately onentering the CPU by a flip-flop enabled by the diu_cpu_rvalid signal.

[2648] To ease timing, non-CPU read data from the DCU is firstregistered in the Read Multiplexor by capturing it in the shared readdata buffer of FIG. 126 enabled by the dcu_dau_rvalid signal.

[2649] The data is then partitioned in 64-bit words on diu_data[63:0].

[2650] 20.14.13.1.1 Non-CPU Read Data Coherency

[2651] Note that for data coherency reasons, a non-CPU read will alwaysresult in read data being returned to the requester which includes theafter-effects of any pending (i.e. pre-arbitrated, but not yet executed)non-CPU write to the same address, which is currently cached in thenon-CPU write buffer. This is shown graphically in Figure n page 319 onpage Err r! B kmark not defined.

[2652] Should the pending write be partially masked, then the read datareturned must take account of that mask. Pending, masked writes by theCDU and SCB, as well as all unmasked non-CPU writes are fully supported.

[2653] Since CPU writes are dealt with on a dedicated write channel, noattempt is made to implement coherency between posted, unexecuted CPUwrites and non-CPU reads to the same address.

[2654] 20.14.13.1.2 Read Multiplexor Command Queue

[2655] When the Arbitration Logic sub-block issues a read command theassociated value of arb_sel[4:0], which indicates which SoPEC Unit haswon arbitration, is written into a buffer, the read command queue.write_en = arb_gnt AND dir_sel[1:0]= =“01” if write_en= =1 then WRITEarb_sel into read command queue

[2656] The encoding of arb_sel[4:0] is given in Table.dir_sel[1:0]==“01” indicates that the operation is a read. The readcommand queue is shown in FIG. 128.

[2657] The command queue could contain values of arb_sel[4:0] for 3reads at a time.

[2658] In the scenario of FIG. 127 the command queue can contain 2values of arb_sel[4:0] i.e. for the simultaneous CDU and CPU accesses.

[2659] In the scenario of FIG. 130, the command queue can contain 3values of arb_sel[4:0] i.e. at the time of the second dcu_dau_rvalidpulse the command queue will contain an arb_sel[4:0] for the arbitrationperformed in that cycle, and the two previous arb_sel[4:0] valuesassociated with the data for the first two dcu_dau_rvalid pulses, thedata associated with the first dcu_dau_rvalid pulse not having beenfully transfered over the shared read data bus.

[2660] The read command queue is specified as 4 deep so it is neverexpected to fill.

[2661] The top of the command queue is a signal read_type[4:0] whichindicates the destination of the current read data. The encoding ofread_type[4:0] is given in Table.

[2662] 20.14.13.1.3 CPU Reads

[2663] Read data for the CPU goes straight out on dram_cpu_data[255:0]and dcu_dau_rvalid is output on diu_cpu_rvalid.

[2664] cpu_read_complete(0) is asserted when a CPU read at the top ofthe read command queue occurs. cpu_read_complete(0) causes the readcommand queue to be popped. cpu_read_complete(0) = (read_type[4:0] = =CPU read) AND (dcu_dau_rvalid = = 1)

[2665] If the current read command queue location points to a non-CPUaccess and the second read command queue location points to a CPU accessthen the next dcu_dau_rvalid pulse received is associated with a CPUaccess. This is the scenario illustrated in FIG. 127. The dcu_dau_rvalidpulse from the DCU must be output to the CPU as diu_cpu_rvalid. This isachieved by using cpu_read_complete(1) to multiplex dcu_dau_rvalid todiu_cpu_rvalid. cpu_read_complete(1) is also used to pop the second fromtop read command queue location from the read command queue.cpu_read_complete(1) = (read_type = = non-CPU read) AND SECOND(read_type = = CPU read) AND (dcu_dau_rvalid = = 1)

[2666] 20.14.13.1.4 Multiplexing dcu_dau_rvalid

[2667] read_type[4:0] and cpu_read_complete(1) multiplexes the datavalid signal, dcu_dau_rvalID, from the DCU, between the CPU and theshared read bus logic. diu_cpu_rvalid is the read valid signal going tothe CPU. noncpu_rvalid is the read valid signal used by the ReadMultiplexor control logic to generate read valid signals for non-CPUreads. if read_type[4:0] = = CPU-read then //select CPU diu_cpu_rvalid:=1 noncpu_rvalid:= 0 if (read_type[4:0]= = non-CPU-read) ANDSECOND(read_type[4:0]= = CPU-read) AND dcu_dau_rvalid = = 1 then//select CPU diu_cpu_rvalid:= 1 noncpu_rvalid:= 0 else //select sharedread bus logic diu_cpu_rvalid:= 0 noncpu_rvalid:= 1

[2668] 20.14.13.1.5 Non-CPU Reads

[2669] Read data for the shared read bus is registered in the sharedread data buffer using noncpu_rvalid. The shared read buffer has 5locations of 64 bits with separate read pointer, read_ptr[2:0], andwrite pointer, write_ptr[2:0]. if noncpu_rvalid = = 1 and (4 spaces inshared read buffer) then shared_read_data_buffer[write_ptr] =dcu_dau_data[63:0] shared_read_data_buffer[write_ptr+1] =dcu_dau_data[127:64] shared_read_data_buffer[write_ptr+2] =dcu_dau_data[191:128] shared_read_data_buffer[write_ptr+3] =dcu_dau_data[255:192]

[2670] The data written into the shared read buffer must be output tothe correct SoPEC DIU read requestor according to the value ofread_type[4:0] at the top of the command queue. The data is output 64bits at a time on diu_data[63:0] according to a multiplexor controlledby read_ptr[2:0].

[2671] diu_data[63:0=shared_read_data_buffer[read_ptr]

[2672]FIG. 126 shows how read_type[4:0] also selects which shared readbus requesters diu_<unit>_rvalid signal is connected to shared_rvalid.Since the data from the DCU is registered in the Read Multiplexor thenshared_rvalid is a delayed version of noncpu_rvalid.

[2673] When the read valID, diu_<unit>_rvalID, for the commandassociated with read_type[4:0] has been asserted for 4 cycles then asignal shared_read_complete is asserted. This indicates that the readhas completed. shared_read_complete causes the value of read_type[4:0]in the read command queue to be popped.

[2674] A state machine for shared read bus access is shown in FIG. 129.This show the generation of shared_rvalID, shared_read_complete and theshared read data buffer read pointer, read_ptr[2:0], being incremented.

[2675] Some points to note from FIG. 129 are:

[2676] shared_rvalid is asserted the cycle after dcu_dau_rvalidassociated with a shared read bus access. This matches the cycle delayin capturing dau_dcu_data[255:0] in the shared read data buffer.shared_rvalid remains asserted in the case of back to back shared readbus accesses.

[2677] shared_read_complete is asserted in the last shared_rvalid cycleof a non-CPU access. shared_read_complete causes the shared read dataqueue to be popped.

[2678] 20.14.13.1.6 Read Command Queue Read Pointer Logic

[2679] The read command queue read pointer logic works as follows. ifshared_read_complete = = 1 OR cpu_read_complete(0) = = 1 then POP top ofread command queue if cpu_read_complete(1) = = 1 then POP second readcommand queue location

[2680] 20.14.13.1.7 Debug Signals

[2681] shared_read_complete and cpu_read_complete together defineread_which indicates to the debug logic that a read has completed. Thesource of the read is indicated on read_sel[4:0]. read_complete =shared_read_complete OR cpu_read_complete(0) OR cpu_read_complete(1) ifcpu_read_complete(1) = = 1 then read_sel:= SECOND(read_type) elseread_sel:= read_type

[2682] 20.14.13.1.8 Flow Control

[2683] There are separate indications that the Read Multiplexor is ableto accept CPU and shared read bus commands from the Arbitration Logic.These are indicated by read_cmd_rdy[1:0].

[2684] The Arbitration Logic can always issue CPU reads except if theread command queue fills. The read command queue should be large enoughthat this should never occur. //Read Multiplexor ready for ArbitrationLogic to issue CPU reads read_cmd_rdy[0] = = read command queue not full

[2685] For the shared read data, the Read Multiplexor deasserts theshared read bus read_cmd_rdy[1] indication until a space is available inthe read command queue. The read command queue should be large enoughthat this should never occur.

[2686] read_cmd_rdy[1] is also deasserted to provide flow control backto the Arbitration Logic to keep the shared read data bus just full.//Read Multiplexor not ready for Arbitration Logic to issue non-CPUreads read_cmd_rdy[1] = (read command queue not full) AND (flow_control= 0)

[2687] The flow control condition is that DCU read data from the secondof two back-to-back shared read bus accesses becomes available. Thiscauses read_cmd rdy[1] to de-assert for 1 cycle, resulting in a repeatedMSN2 DCU state. The timing is shown in FIG. 130. flow_control =(read_type[4:0] = = non-CPU read) AND SECOND(read_type[4:0] = = non- CPUread) AND (current DCU state = = MSN2) AND (previous DCU state = =MSN1).

[2688]FIG. 130 shows a series of back to back transfers over the sharedread data bus. The exact timing of the implementation must not introduceany additional latency on shared read bus read transfers i.e.arbitration must be re-enabled just in time to keep back to back sharedread bus data full.

[2689] The following sequence of events is illustrated in FIG. 130:

[2690] Data from the first DRAM access is written into the shared readdata buffer.

[2691] Data from the second access is available 3 cycles later, but itstransfer into the shared read buffer is delayed by a cycle, due to theMSN2 stall condition. (During this delay, read data for access 2 ismaintained at the output of the DRAM.) A similar 1-cycle delay isintroduced for every subsequent read access until the back-to-backsequence comes to an end.

[2692] Note that arbitration always occurs during the last MSN2 state ofany access. So, for the second and later of any back-to-back non-CPUreads, arbitration is delayed by one cycle, i.e. it occurs every fourthcycle instead of the standard every third.

[2693] This mechanism provides flow control back to the ArbitrationLogic sub-block. Using this mechanism means that the access rate will belimited to which ever takes longer—DRAM access or transfer of read dataover the shared read data bus. CPU reads are always be accepted by theRead Multiplexor.

[2694] 20.14.13.2 Write Multiplexor Logic Description

[2695] The Write Multiplexor supplies write data to the DCU.

[2696] There are two separate write channels, one for CPU data oncpu_diu_(—)[127:0], one for non-CPU data on non_cpu_wdata[255:0]. Asignal write_data valid[1:0] indicates to the Command Multiplexor thatthe data is valid. The Command Multiplexor then asserts a signalwrite_data_accept[1:0] indicating that the data has been captured by theDRAM and the appropriate channel in the Write Multiplexor can accept thenext write data.

[2697] Timing waveforms for write accesses are shown in FIG. 92 to FIG.94, respectively.

[2698] There are 3 types of write accesses:

[2699] CPU accesses

[2700] CPU write data on cpu_diu_wdata[127:0] is output oncpu_wdata[127:0]. Since CPU writes are posted, a local buffer is used tostore the write data, address and mask until the CPU wins arbitration.This buffer is one position deep. write_data_valid[0], which issynonymous with !diu_cpu_w_write_rdy, remains asserted until the CommandMultiplexor indicates it has been written to the DRAM by assertingwrite_data_accept[0]. The CPU write buffer can then accept new postedwrites.

[2701] For non-CPU writes, the Write Multiplexor multiplexes the writedata from the DIU write requester to the write data buffer and the<unit>_diu_wvalid signal to the write multiplexor control logic.

[2702] CDU accesses

[2703] 64-bits of write data each for a masked write to a separate256-bit word are transferred to the Write Multiplexor over 4 cycles.

[2704] When a CDU write is selected the first 64-bits of write data oncdu_diu_wdata[63:0] are multiplexed to non_cpu_wdata[63:0].write_data_valid[1] is asserted to indicate a non-CPU access whencdu_diu_wvalid is asserted. The data is also written into the firstlocation in the write data buffer. This is so that the data can continueto be output on non_cpu_wdata[63:0] and write_data_valid[1] remainsasserted until the Command Multiplexor indicates it has been written tothe DRAM by asserting write_data_accept[1]. Data continues to beaccepted from the CDU and is written into the other locations in thewrite data buffer. Successive write_data_accept[1] pulses cause thesuccessive 64-bit data words to be output on wdata[63:0] together withwrite_data_valid[1]. The last write_data_accept[1] means the writebuffer is empty and new write data can be accepted.

[2705] Other write accesses.

[2706] 256-bits of write data are transferred to the Write Multiplexorover 4 successive cycles.

[2707] When a write is selected the first 64-bits of write data on<unit>_diu_wdata[63:0] are written into the write data buffer. The next64-bits of data are written to the buffer in successive cycles. Once thelast 64-bit word is available on <unit>_diu_wdata[63:0] the entire wordis output on non_cpu_wdata[255:0], write_data_valid [1] is asserted toindicate a non-CPU access, and the last 64-bit word is written into thelast location in the write data buffer. Data continues to be output onnon_cpu_wdata[255:0] and write_data_valid[1] remains asserted until theCommand Multiplexor indicates it has been written to the DRAM byasserting write_data_accept[1]. New write data can then be written intothe write buffer.

[2708] CPU write multiplexor control logic

[2709] When the Command Multiplexor has issued the CPU write it assertswrite_data_accept[0]. write_data_accept[0] causes the write multiplexorto assert write_cmd_rdy[0].

[2710] The signal write_cmd_rdy[0] tells the Arbitration Logic sub-blockthat it can issue another CPU write command i.e. the CPU write databuffer is empty.

[2711] Non-CPU Write Multiplexor Control Logic

[2712] The signal write_cmd_rdy[1] tells the Arbitration Logic sub-blockthat the Write Multiplexor is ready to accept another non-CPU writecommand. When write_cmd_rdy[1] is asserted the Arbitration Logic canissue a write command to the Write Multiplexor. It does this by writingthe value of arb_sel[4:0] which indicates which SoPEC Unit has wonarbitration into a write command register, write_cmd[3:0]. write_en =arb_gnt AND dir_sel[1]= =1 AND arb_sel = non- CPU if write_en= =1 thenwrite_cmd = arb_sel

[2713] The encoding of arb_sel[4:0] is given in Table. dir_sel[1]==1indicates that the operation is a write. arb_sel[4:0] is only written tothe write command register if the write is a non-CPU write.

[2714] A rule was introduced in Section 20.7.2.3 Interleaving read andwrite accesses to the effect that non-CPU write accesses would not beallocated adjacent timeslots. This means that a single write commandregister is required.

[2715] The write command register, write_cmd[3:0], indicates the sourceof the write data. write_cmd[3:0] multiplexes the write data<unit>_diu_wdata, and the data valid signal, <unit>_diu_wvalID, from theselected write requestor to the write data buffer. Note, that CPU writedata is not included in the multiplex as the CPU has its own writechannel. The <unit>_diu_wvalid are counted to generate the signalword_sel[1:0] which decides which 64-bit word of the write data bufferto store the data from <unit>_diu_wdata. //when the Command Multiplexoraccepts the write data if write_data_accept[1] = 1 then //reset the wordselect signal word_sel[1:0]=00 //when wvalid is asserted if wvalid = 1then //increment the word select signal if word_sel[1:0] = = 11 thenword_sel[1:0] = = 00 else word_sel[1:0] = = word_sel[1:0] + 1

[2716] wvalid is the <unit>_diu_wvalid signal multiplexed bywrite_cmd[3:0]. word_sel[1:0] is reset when the Command Multiplexoraccepts the write data. This is to ensure that word_sel[1:0] is alwaysstarts at 00 for the first wvalid pulse of a 4 cycle write datatransfer.

[2717] The write command register is able to accept the next write whenthe Command Multiplexor accepts the write data by assertingwrite_data_accept[1]. Only the last write_data_accept[1] pulseassociated with a CDU access (there are 4) will cause the write commandregister to be ready to accept the next write data.

[2718] Flow control back to the Command Multiplexor

[2719] write_cmd rdy[0] is asserted when the CPU data buffer is empty.

[2720] write_cmd_rdy[1] is asserted when both the write command registerand the write data buffer is empty.

[2721] PEP Subsystem

[2722] 21 PEP Controller Unit (PCU)

[2723] 21.1 Overview

[2724] The PCU has three functions:

[2725] The first is to act as a bus bridge between the CPU-bus and thePCU-bus for reading and writing PEP configuration registers.

[2726] The second is to support page banding by allowing the PEP blocksto be reprogrammed between bands by retrieving commands from DRAMinstead of being programmed directly by the CPU.

[2727] The third is to send register debug information to the RDU,within the CPU subsystem, when the PCU is in Debug Mode.

[2728] 21.2 Interfaces Between PCU and Other Units

[2729] 21.3 Bus Bridge

[2730] The PCU is a bus-bridge between the CPU-bus and the PCU-bus. ThePCU is a slave on the CPU-bus but is the only master on the PCU-bus. SeeFigure page 39 on page Error! Bookmark not defined.

[2731] 21.3.1 CPU Accessing PEP

[2732] All the blocks in the PEP can be addressed by the CPU via thePCU. The MMU in the CPU-subsystem will decode a PCU select signal,cpu_pcu_sel, for all the PCU mapped addresses (see section 11.4.3 onpage 69). Using cpu_adr bits 15-12 the PCU will decode individual blockselects for each of the blocks within the PEP. The PEP blocks thendecode the remaining address bits needed to address their PCU-bus mappedregisters. Note: the CPU is only permitted to perform supervisor-modedata-type accesses of the PEP, i.e. cpu_acode=11. If the PCU is selectedby the CPU and any other code is present on the cpu_acode bus the accessis ignored by the PCU and the pcu_cpu_berr signal is strobed,

[2733] CPU commands have priority over DRAM commands. When the PCU isexecuting each set of four commands retrieved from DRAM the CPU canaccess PCU-bus registers. In the case that DRAM commands are beingexecuted and the CPU resets the CmdSource to zero, the contents of theDRAM CmdFifo is invalidated and no further commands from the fifo areexecuted. The CmdPending and NextBandCmdEnable work registers are alsocleared.

[2734] When a DRAM command writes to the CmdAdr register it means thenext DRAM access will occur at the address written to CmdAdr. Thereforeif the JUMP instruction is the first command in a group of four, theother three commands get executed and then the PCU will issue a readrequest to DRAM at the address specified by the JUMP instruction. If theJUMP instruction is the second command then the following two commandswill be executed before the PCU requests from the new DRAM addressspecified by the JUMP instruction etc. Therefore the PCU will alwaysexecute the remaining commands in each four command group beforecarrying out the JUMP instruction.

[2735] 21.4 Page Banding

[2736] The PCU can be programmed to associate microcode in DRAM witheach finishedband signal. When a finishedband signal is asserted the PCUwill read commands from DRAM and execute these commands. These commandsare each 64-bits (see Section 21.8.5) and consist of 32-bit address bitsand 32 data bits and allow PCU mapped registers to be programmeddirectly by the PCU.

[2737] If more than one finishedband signal is received at the sametime, or others are received while microcode is already executing, thePCU will hold the commands as pending, and will execute them at thefirst opportunity.

[2738] Each microcode program associated with cdu_finishedband,lbd_finishedband and te_finishedband would simply restart theappropriate unit with new addresses—a total of about 4 or 5 microcodeinstructions. As well, or alternatively, pcu_finishedband can be used toset up all of the units and therefore involves many more instructions.This minimizes the time that a unit is idle in between bands. Thepcu_finishedband control signal is issued once the specified combinationof CDU, LBD and TE (programmed in BandSelectMask) have finished theirprocessing for a band.

[2739] 21.5 Interrupts, Address Legality and Security

[2740] Interrupts are generated when the various page expansion unitshave finished a particular band of data from DRAM. The cdu_finishedband,lbd_finishedband and te_finishedband signals are combined in the PCUinto a single interrupt pcu_finishedband which is exported by the PCU tothe interrupt controller.

[2741] The PCU mapped registers should only be accessible fromSupervisor Data Mode. The area of DRAM where PCU commands are storedshould be a Supervisor Mode only DRAM area, although this is notenforced by the PCU.

[2742] When the PCU is executing commands from DRAM, any block-addressdecoded from a command which is not part of the PEP block-address mapwill cause the PCU to ignore the command and strobe thepcu_icu_address_invalid interrupt signal. The CPU can then interrogatethe PCU to find the source of the illegal command. The MMU will ensurethat the CPU cannot address an invalid PEP subsystem block.

[2743] When the PCU is executing commands from DRAM, any address decodedfrom a command which is not part of the PEP address map will cause thePCU to:

[2744] Cease execution of current command and flush all remainingcommands already retrieved from DRAM.

[2745] Clear CmdPending work-register.

[2746] Clear NextBandCmdEnable registers.

[2747] Set CmdSource to zero.

[2748] In addition to cancelling all current and pending DRAM accessesthe PCU strobes the pcu_icu_address_invalid interrupt signal. The CPUcan then interrogate the PCU to find the source of the illegal command.

[2749] 21.6 Debug Mode

[2750] When the need to monitor the (possibly changing) value in any PEPconfiguration register the PCU may be placed in Debug Mode. This is donevia the CPU setting certain Debug Address register within the PCU. Oncein Debug Mode the PCU continually reads the target PEP configurationregister and sends the read value to the RDU. Debug Mode has the lowestpriority of all PCU functions: if the CPU wishes to perform an access orthere are DRAM commands to be executed they will interrupt the Debugaccess, and the PCU will resume Debug access once a CPU or DRAM commandhas completed.

[2751] 21.7 Implementation

[2752] 21.7.1 Definitions of I/O TABLE 139 PCU Port List Port Name PinsI/O Description Clocks and Resets Pclk 1 In SoPEC functional clockprst_n 1 In Active-low, synchronous reset in pclk domain End of BandFunctionality cdu_(—) 1 In Finished band signal from CDU finishedbandlbd_(—) 1 In Finished band signal from LBD finishedband te_(—) 1 InFinished band signal from TE finishedband pcu_(—) 1 Out Asserted oncethe specified finishedband combination of CDU, LBD, and TE have finishedtheir pro- cessing for a band. PCU address error pcu_icu_(—) 1 OutStrobed if PCU decodes a non address_(—) PEP address from commandsinvalid retrieved from DRAM or CPU. CPU Subsystem Interface Signalscpu_adr[15:2] 14 In CPU address bus. 14 bits are required to decode theaddress space for the PEP. cpu_dataout 32 In Shared write data bus from[31:0] the CPU pcu_cpu_(—) 32 Out Read data bus to the CPU data[31:0]cpu_rwn 1 In Common read/not-write signal from the CPU cpu_acode 2 InCPU Access Code signals. [1:0] These decode as follows: 00 - Userprogram access 01 - User data access 10 - Supervisor program access 11 -Supervisor data access cpu_pcu_sel 1 In Block select from the CPU. Whencpu_pcu_sel is high both cpu_adr and cpu_dataout are valid pcu_cpu_rdy 1Out Ready signal to the CPU. When pcu_cpu_rdy is high it indi- cates thelast cycle of the access. For a write cycle this means cpu_dataout hasbeen registered by the block and for a read cycle this means the data onpcu_cpu_data is valid. pcu_cpu_berr 1 Out Bus error signal to the CPUindicating an invalid access. pcu_cpu_(—) 1 Out Debug Data valid ondebug_valid pcu_cpu_data bus. Active high. PCU Interface to PEP blockspcu_adr[11:2] 10 Out PCU address bus. The 10 least significant bits ofcpu_adr [15:2] allow 1024 32-bit word addressable locations per PEPblock. Only the number of bits required to decode the address space areexported to each block. pcu_dataout 32 Out Shared write data bus fromthe [31:0] PCU <unit>_pcu_(—) 32 In Read data bus from each PEPdatain[31:0] subblock to the PCU pcu_rwn 1 Out Common read/not-writesignal from the PCU pcu_<unit>_(—) 1 Out Block select for each PEP blocksel from the PCU. Decoded from the 4 most significant bits ofcpu_adr[15:2]. When pcu_<unit>_sel is high both pcu_adr and pcu_dataoutare valid <unit>_pcu_(—) 1 In Ready from each PEP block signal rdy tothe PCU. When <unit>_pcu_rdy is high it indicates the last cycle of theaccess. For a write cycle this means pcu_dataout has been registered bythe block and for a read cycle this means the data on <unit>_pcu_datainis valid. DIU Read Interface signals pcu_diu_rreq 1 Out PCU requestsDRAM read. A read request must be accompanied by a valid read address.pcu_diu_radr 17 Out Read address to DIU [21:5] 17 bits wide (256-bitaligned word). diu_pcu_rack 1 In Acknowledge from DIU that read requesthas been accepted and new read address can be placed on pcu_diu_radrdiu_data[63:0] 64 In Data from DIU to PCU. First 64-bits is bits 63:0 of256 bit word Second 64-bits is bits 127:64 of 256 bit word Third 64-bitsis bits 191:128 of 256 bit word Fourth 64-bits is bits 255:192 of 256bit word diu_pcu_(—) 1 In Signal from DIU telling PCU rvalid that validread data is on the diu_data bus

[2753] 21.7.2 Configuration Registers TABLE 140 PCU ConfigurationRegisters Address # PCU_base+ register bits reset description Controlregisters 0x00 Reset 1 0x1 A write to this register causes a reset ofthe PCU. This register can be read to indicate the reset state: 0 -reset in progress 1 - reset not in progress 0x04 CmdAdr[21:5] 17 0x00000 The address of the next set of (256-bit aligned commands to retrievefrom DRAM. DRAM address) When this register is written to, either by theCPU or DRAM command, 1 is also written to CmdSource to cause theexecution of the commands at the specified address. 0x08 BandSelectMask3 0x0 Selects which input finishedBand [2:0] flags are to be watched togene- rate the combined pcu_finishedband signal. Bit0 - lbd_finishedbandBit1 - cdu_finishedband Bit2 - te_finishedband 0x0C, NextBandCmdAdr 4 ×17 0x00 000 The address to transfer to CmdAdr 0x10, [3:0][21:5] as soonas possible after the next 0x14, (256-bit aligned finishedBand[n] signalhas been 0x18 DRAM address) received as long as NextBandCmdEnable[n] isset. A write from the PCU to NextBandCmdAdr[n] with a non-zero valuealso sets NextBandCmdEnable[n]. A write from the PCU toNextBandCmdAdr[n] with a 0 value clears NextBandCmdEnable[n]. 0x1CNextCmdAdr[21:5] 17 0x00 000 The address to transfer to CmdAdr when theCPU pending bit (CmdPending[4]) get serviced. A write from the PCU toNextCmdAdr[n] with a non-zero value also sets CmdPending[4]. A writefrom the PCU to NextCmdAdr[n] with a 0 value clears CmdPending[4] 0x20CmdSource 1 0x0 0 - commands are taken from the CPU 1 - commands aretaken from the CPU as well as DRAM at CmdAdr. 0x24 DebugSelect[15:2] 140x00 00 Debug address select. Indicates the address of the register toreport on the pcu_cpu_data bus when it is not otherwise being used, andthe PEP bus is not being used Bits [15:12] select the unit (see Table )Bits [11:2] select the register within the unit Work registers (readonly) 0x28 InvalidAddress 19 0 DRAM Address of current 64-bit [21:3](64-bit command attempting to execute. aligned DRAM) Read only register.0x2C CmdPending 5 0x00 For each bit n, where n is 0 to 3 0 -no commandspending for NextBandCmdAdr[n] 1 -commands pending for NextBandCmdAdr[n]For bit 4 0 -no commands pending for NextCmdAdr[n] 1 -commands pendingfor NextCmdAdr[n] Read only register. 0x34 FinishedSoFar 3 0x0 Theappropriate bit is set when- ever the corresponding input finishedBandflag is set and the corresponding bit in the BandSelectMask bit is alsoset. If all FinishedSoFar bits are set wherever BandSelect bits are alsoset, all FinishedSoFar bits are cleared and the output pcu_finishedbandsignal is given. Read only register. 0x38 NextBandCmdEnable 4 0x0 Thisregister can be written to indirectly (i.e. the bits are set or clearedvia writes to NextBandCmdAdr[n]) For each bit: 0 - do nothing at thenext finishedBand[n] signal. 1 - Execute instructions atNextBandCmdAdr[n] as soon as possible after receipt of the nextfinishedBand[n] signal. Bit0 - lbd_finishedband Bit1 - cdu_finishedbandBit2 - te_finishedband Bit3 - pcu_finishedband Read only register.

[2754] 21.8 Detailed Description

[2755] 21.8.1 PEP Blocks Register Map

[2756] All PEP accesses are 32-bit register accesses.

[2757] From Table 140 it can be seen that four bits only are necessaryto address each of the sub-blocks within the PEP part of SoPEC. Up to 14bits may be used to address any configurable 32-bit register within PEP.This gives scope for 1024 configurable registers per sub-block. Thisaddress will come either from the CPU or from a command stored in DRAM.The bus is assembled as follows:

[2758] adr[15:12]=sub-block address

[2759] adr[n:2]=32-bit register address within sub-block, only thenumber of bits required to decode the registers within each sub-blockare used. TABLE 141 PEP blocks Register Map Block Select Decode = Blockcpu_adr[15:12] PCU 0x0 CDU 0x1 CFU 0x2 LBD 0x3 SFU 0x4 TE 0x5 TFU 0x6HCU 0x7 DNC 0x8 DWU 0x9 LLU 0xA PHI 0xB Reserved 0xC to 0xF

[2760] 21.8.2 Internal PCU PEP Protocol

[2761] The PCU performs PEP configuration register accesses via a selectsignal, pcu_<block>_sel. The read/write sense of the access iscommunicated via the pcu_rwn signal (1=read, 0=write). Write data isclocked out, and read data clocked in upon receipt of the appropriateselect-read/write-address combination.

[2762]FIG. 133 shows a write operation followed by a read operation. Theread operation is shown with wait states while the PEP block returns theread data.

[2763] For access to the PEP blocks a simple bus protocol is used. ThePCU first determines which particular PEP block is being addressed sothat the appropriate block select signal can be generated. During awrite access PCU write data is driven out with the address and blockselect signals in the first cycle of an access. The addressed PEP blockresponds by asserting its ready signal indicating that it has registeredthe write data and the access can complete. The write data bus is commonto all PEP blocks.

[2764] A read access is initiated by driving the address and selectsignals during the first cycle of an access. The addressed PEP blockresponds by placing the read data on its bus and asserting its readysignal to indicate to the PCU that the read data is valid. Each blockhas a separate point-to-point data bus for read accesses to avoid theneed for a tri-stateable bus.

[2765] Consecutive accesses to a PEP block must be separated by at leasta single cycle, during which the select signal must be de-asserted.

[2766] 21.8.3 PCU DRAM Access Requirements

[2767] The PCU can execute register programming commands stored in DRAM.These commands can be executed at the start of a print run to initializeall the registers of PEP. The PCU can also execute instructions at thestart of a page, and between bands. In the inter-band time, it iscritical to have the PCU operate as fast as possible. Therefore in theinter-page and inter-band time the PCU needs to get low latency accessto DRAM.

[2768] A typical band change requires on the order of 4 commands torestart each of the CDU, LBD, and TE, followed by a single command toterminate the DRAM command stream. This is on the order of 5 commandsper restart component.

[2769] The PCU does single 256 bit reads from DRAM. Each PCU command is64 bits so each 256 bit DRAM read can contain 4 PCU commands. Therequested command is read from DRAM together with the next 3 contiguous64-bits which are cached to avoid unnecessary DRAM reads. Writing zeroto CmdSource causes the PCU to flush commands and terminate programaccess from DRAM for that command stream. The PCU requires a 256-bitbuffer to the 4 PCU commands read by each 256-bit DRAM access. When thebuffer is empty the PCU can request DRAM access again. Adding a 256-bitdouble buffer would allow the next set of 4 commands to be fetched fromDRAM while the current commands are being executed.

[2770] 1024 commands of 64 bits requires 8 kB of DRAM storage.

[2771] Programs stored in DRAM are referred to as PCU Program Code.

[2772] 21.8.4 End of Band Unit

[2773] The state machine is responsible for watching the various inputxx_finishedband signals, setting the FinishedSoFar flags, and outputtingthe pcu_finishedband flags as specified by the BandSelect register.

[2774] Each cycle, the end of band unit performs the following tasks:pcu_finishedband = (FinishedSoFar[0] = = BandSelectMask[0]) AND(FinishedSoFar[1] = = BandSelectMask[1]) AND (FinishedSoFar[2] = =BandSelectMask[2]) AND (BandSelectMask[0] OR BandSelectMask[1] ORBandSelectMask[2]) if (pcu_finishedband = = 1) then FinishedSoFar[0] = 0FinishedSoFar[1] = 0 FinishedSoFar[2] = 0 else FinishedSoFar[0] =(FinishedSoFar[0] OR lbd_finishedband) AND BandSelectMask[0]FinishedSoFar[1] = (FinishedSoFar[1] OR cdu_finishedband) ANDBandSelectMask[1] FinishedSoFar[2] = (FinishedSoFar[2] ORte_finishedband) AND BandSelectMask[2]

[2775] Note that it is the responsibility of the microcode at the startof printing a page to ensure that all 3 FinishedSoFar bits are cleared.It is not necessary to clear them between bands since this happensautomatically.

[2776] If a bit of BandSelectMask is cleared, then the corresponding bitof FinishedSoFar has no impact on the generation of pcu_finishedband.

[2777] 21.8.5 Executing Commands From DRAM

[2778] Registers in PEP can be programmed by means of simple 64-bitcommands fetched from DRAM. The format of the commands is given in Table142. Register locations can have a data value of up to 32 bits. Commandsare PEP register write commands only. TABLE 142 Register write commandsin PEP bits bits bits bits command 63-32 31-16 15-2 1-0 Register writedata zero 32-bit zero word address

[2779] Due attention must be paid to the endianness of the processor.The LEON processor is a big-endian processor (bit 7 is the mostsignificant bit).

[2780] 21.8.6 General Operation

[2781] Upon a Reset condition, CmdSource is cleared (to 0), which meansthat all commands are initially sourced only from the CPU bus interface.Registers and can then be written to or read from one location at a timevia the CPU bus interface.

[2782] If CmdSource is 1, commands are sourced from the DRAM at CmdAdrand from the CPU bus.

[2783] Writing an address to CmdAdr automatically sets CmdSource to 1,and causes a command stream to be retrieved from DRAM. The PCU willexecute commands from the CPU or from the DRAM command stream, givinghigher priority to the CPU always.

[2784] If CmdSource is 0 the DRAM requester examines the CmdPending bitsto determine if a new DRAM command stream is pending. If any ofCmdPending bits are set, then the appropriate NextBandCmdAdr orNextCmdAdr is copied to CmdAdr (causing CmdSource to get set to 1) and anew command DRAM stream is retrieved from DRAM and executed by the PCU.If there are multiple pending commands the DRAM requestor will servicethe lowest number pending bit first. Note that a new DRAM command streamonly gets retrieved when the current command stream is empty.

[2785] If there are no DRAM commands pending, and no CPU commands thePCU defaults to an idle state. When idle the PCU address bus defaults tothe DebugSelect register value (bits 11 to 2 in particular) and thedefault unit PCU data bus is reflected to the CPU data bus. The defaultunit is determined by the DebugSelect register bits 15 to 12.

[2786] In conjunction with this, upon receipt of a finishedBand[n]signal, NextBandCmdEnable[n] is copied to CmdPending[n] andNextBandCmdEnable[n] is cleared. Note, each of the LBD, CDU, and TE(where present) may be re-programmed individually between bands byappropriately setting NextBandCmdAdr[2-0] respectively. However,execution of inter-band commands may be postponed until all blocksspecified in the BandSelectMask register have pulsed their finishedbandsignal. This may be accomplished by only setting NextBandCmdAdr[3](indirectly causing NextBandCmdEnable[3] to be set) in which case it isthe pcu_finishedband signal which causes NextBandCmdEnable[3] to becopied to CmdPending[3].

[2787] To conveniently update multiple registers, for example at thestart of printing a page, a series of Write Register commands can bestored in DRAM. When the start address of the first Write Registercommand is written to the CmdAdr register (via the CPU), the CmdSourceregister is automatically set to 1 to actually start the execution atCmdAdr. Alternatively the CPU can write to NextCmdAdr causing theCmdPending[4] bit to get set, which will then get serviced by the DRAMrequester in the pending bit arbitration order.

[2788] The final instruction in the command block stored in DRAM must bea register write of 0 to CmdSource so that no more commands are readfrom DRAM. Subsequent commands will come from pending programs or can besent via the CPU bus interface.

[2789] 21.8.6.1 Debug Mode

[2790] Debug mode is implemented by reusing the normal CPU and DRAMaccess decode logic. When in the Arbitrate state (see state machine Abelow), the PEP address bus is defaulted to the value in the DebugSelectregister. The top bits of the DebugSelect register are used to decode aselect to a PEP unit and the remaining bits are reflected on the PEPaddress bus. The selected units read data bus is reflected on thepcu_cpu_data bus to the RDU in the CPU. The pcu_cpu_debug_valid signalindicates to the RDU that the data on the pcu_cpu_data bus is validdebug data.

[2791] Normal CPU and DRAM command access will require the PEP bus, andas such will cause the debug data to be invalid during the access, thisis indicated to the RDU by setting pcu_cpu_debug_valid to zero.

[2792] The decode logic is: // Default Debug decode if state = =Arbitrate then if (cpu_pcu_sel = = 1 AND cpu_acode /=SUPERVISOR_DATA_MODE) then pcu_cpu_debug_valid  = 0 // bus errorcondition pcu_cpu_data = 0 else <unit> = decode(DebugSelect[15:12]) if(<unit> = = PCU ) then pcu_cpu_data = Internal PCU register elsepcu_cpu_data = <unit>_pcu_datain[31:0] pcu_adr[11:2] = DebugSelect[11:2]pcu_cpu_debug_valid = 1 AFTER 4 clock cycles else pcu_cpu_debug_valid =0

[2793] 21.8.7 State Machines

[2794] DRAM command fetching and general command execution isaccomplished using two state machines. State machine A evaluates whethera CPU or DRAM command is being executed, and proceeds to execute thecommand(s). Since the CPU has priority over the DRAM it is permitted tointerrupt the execution of a stream of DRAM commands.

[2795] Machine B decides which address should be used for DRAM access,fetches commands from DRAM and fills a command fifo which A executes.The reason for separating the two functions is to facilitate theexecution of CPU or Debug commands while state machine B is performingDRAM reads and filling the command fifo. In the case where state machineA is ready to execute commands (in its Arbitrate state) and it sees botha full DRAM command fifo and an active cpu_pcu_sel then the DRAMcommands are executed last.

[2796] 21.8.7.1 State Machine A: Arbitration and Execution of Commands

[2797] The state-machine enters the Reset state when there is an activestrobe on either the reset pin, prst_n, or the PCU's soft-resetregister. All registers in the PCU are zeroed, unless otherwisespecified, on the next rising clock edge. The PCU self-deasserts thesoft reset in the pclk cycle after it has been asserted.

[2798] The state changes from Reset to Arbitrate when prst_n==1 andPCU_softreset==1.

[2799] The state-machine waits in the Arbitrate state until it detects arequest for CPU access to the PEP units (cpu_pcu_sel==1 andcpu_acode==11) or a request to execute DRAM commands CmdSource==1, andDRAM commands are available, CmdFifoFull==1. Note if (cpu_pcu_sel==1 andcpu_acode !=11) the CPU is attempting an illegal access. The PCU ignoresthis command and strobes the cpu_pcu_berr for one cycle.

[2800] While in the Arbitrate state the machine assigns the DebugSelectregister to the PCU unit decode logic and the remaining bits to the PEPaddress bus. When in this state the debug data returned from theselected PEP unit is reflected on the CPU bus (pcu_cpu_data bus) and thepcu_cpu_debug valid=1.

[2801] If a CPU access request is detected (cpu_pcu_sel==1 andcpu_acode==11) then the machine proceeds to the CpuAccess state. In theCpuAccess state the cpu address is decoded and used to determine the PEPunit to select. The remaining address bits are passed through to the PEPaddress bus. The machine remains in the CpuAccess state until a validready from the selected PEP unit is received. When received the machinereturns to the arbitrate state, and the ready signal to the CPU ispulsed. // decode the logic pcu_<unit>_sel = decode(cpu_adr[15:12])pcu_adr[11:2] = cpu_adr[11:2]

[2802] The CPU is prevented from generating an invalid PEP unit address(prevented in the MMU) and so CPU accesses cannot generate an invalidaddress error.

[2803] If the state machine detects a request to execute DRAM commands(CmdSource==1), it will wait in the Arbitrate state until commands havebeen loaded into the command FIFO from DRAM (all controlled by statemachine B). When the DRAM commands are available (cmd_fifo_full==1) thestate machine will proceed to the DRAMAccess state.

[2804] When in the DRAMAccess state the commands are executed from thecmd_(—)_fifo. A command in the cmd_fifo consists of 64-bits (or whichthe FIFO holds 4). The decoding of the 64-bits to commands is given inTable. For each command the decode is // DRAM command decodepcu_<unit>_sel = decode( cmd_fifo[cmd_count][15:12] ) pcu_adr[11:2]  =cmd_fifo[cmd_count][11:2] pcu_dataout  = cmd_fifo[cmd_count][63:32]

[2805] When the selected PEP unit returns a ready signal(<unit>_pcu_rdy==1) indicating the command has completed, the statemachine will return to the Arbitrate state. If more commands exists(cmd_count !=0) the transition will decrement the command count.

[2806] When in the DRAMAccess state, if when decoding the DRAM commandaddress bus (cmd_fifo[cmd_count][15.12]), the address selects a reservedaddress, the state machine proceeds to the AdrError state, and then backto the Arbitrate state. An address error interrupt will be generated andthe DRAM command FIFOs will be cleared.

[2807] A CPU access can pre-empt any pending DRAM commands. After eachcommand is completed the state machine returns to the Arbitrate state.If a CPU access is required and DRAM command stream is executing the CPUaccess always takes priority. If a CPU or DRAM command sets theCmdSource to 0, all subsequent DRAM commands in the command FIFO arecleared. If the CPU sets the CmdSource to 0 the CmdPending andNextBandCmdEnable work registers are also cleared.

[2808] 21.8.7.2 State Machine B: Fetching DRAM Commands

[2809] A system reset (prst_n==0) or a software reset(pcu_softreset_n==0) will cause the state machine to reset to the Resetstate. The state machine remains in the Reset until both resetconditions are removed. When removed the machine proceeds to the Waftstate.

[2810] The state machine waits in the Wait state until it determinesthat commands are needed from DRAM. Two possible conditions exist thatrequire DRAM access. Either the PCU is processing commands which must befetched from DRAM (cmd_source==1), and the command FIFO is empty(cmd_fifo_full_(—) _(—)=0), or the cmd_source==0 and the command FIFO isempty and there are some commands pending (cmd_pending !=0). In eitherof these conditions the machine proceeds to the Ack state and issues aread request to DRAM (pcu_diu_rreq==1), it calculates the address toread from dependent on the transition condition. In the command pendingtransition condition, the highest priority NextBandCmdAdr (orNextCmdAdr) that is pending is used for the read address (pcu_diu_radr)and is also copied to the CmdAdr register. If multiple pending bits areset the lowest pending bits are serviced first. In the normal PCUprocessing transition the pcu_diu_radr is the CmdAdr register.

[2811] When an acknowledge is received from the DRAM the state machinegoes to the FillFifo state. In the FillFifo state the machine waits forthe DRAM to respond to the read request and transfer data words. Onreceipt of the first word of data diu_pcu_rvalid==1, the machine storesthe 64-bit data word in the command FIFO (cmd_fifo[3]) and transitionsto the Data1, Data2, Data3 states each time waiting for adiu_pcu_rvalid==1 and storing the transferred data word to cmd_fifo[2],cmd_fifo[1] and cmd_fifo[0] respectively.

[2812] When the transfer is complete the machine returns to the Waitstate, setting the cmd_count to 3, the cmd_fifo_full is set to 1 and theCmdAdr is incremented.

[2813] If the CPU sets the CmdSource register low while the PCU is inthe middle of a DRAM access, the statemachine returns to the Wait stateand the DRAM access is aborted.

[2814] 21.8.7.3 PCU_ICU_Address_Invalid Interrupt

[2815] When the PCU is executing commands from DRAM, addresses decodedfrom commands which are not PCU mapped addresses (4-bits only) willresult in the current command being ignored and thepcu_icu_address_invalid interrupt signal is strobed. When an invalidcommand occurs all remaining commands already retrieved from DRAM areflushed from the CmdFifo, and the CmdPending, NextBandCmdEnable andCmdSource registers are cleared to zero.

[2816] The CPU can then interrogate the PCU to find the source of theillegal DRAM command via the InvalidAddress register.

[2817] The CPU is prevented by the MMU from generating an invalidaddress command.

[2818] 22 Contone Decoder Unit (CDU)

[2819] 22.1 Overview

[2820] The Contone Decoder Unit (CDU) is responsible for performing theoptional decompression of the contone data layer.

[2821] The input to the CDU is up to 4 planes of compressed contone datain JPEG interleaved format. This will typically be 3 planes,representing a CMY contone image, or 4 planes representing a CMYKcontone image. The CDU must support a page of A4 length (11.7 inches)and Letter width (8.5 inches) at a resolution of 267 ppi in 4 colors anda print speed of 1 side per 2 seconds.

[2822] The CDU and the other page expansion units support the notion ofpage banding. A compressed page is divided into one or more bands, witha number of bands stored in memory. As a band of the page is consumedfor printing a new band can be downloaded. The new band may be for thecurrent page or the next page. Band-finish interrupts have been providedto notify the CPU of free buffer space.

[2823] The compressed contone data is read from the on-chip DRAM. Theoutput of the CDU is the decompressed contone data, separated intoplanes. The decompressed contone image is written to a circular bufferin DRAM with an expected minimum size of 12 lines and a configurablemaximum. The decompressed contone image is subsequently read a line at atime by the CFU, optionally color converted, scaled up to 1600 ppi andthen passed on to the HCU for the next stage in the printing pipeline.The CDU also outputs a cdu_finishedband control flag indicating that theCDU has finished reading a band of compressed contone data in DRAM andthat area of DRAM is now free. This flag is used by the PCU and isavailable as an interrupt to the CPU.

[2824] 22.2 Storage Requirements for Decompressed Contone Data in DRAM

[2825] A single SoPEC must support a page of A4 length (11.7 inches) andLetter width (8.5 inches) at a resolution of 267 ppi in 4 colors and aprint speed of 1 side per 2 seconds. The printheads specified in theBi-lithic Printhead Specification [2] have 13824 nozzles per color toprovide full bleed printing for A4 and Letter. At 267 ppi, there are2304 contone pixels⁹ per line represented by 288 JPEG blocks per color.However each of these blocks actually stores data for 8 lines, since asingle JPEG block is 8×8 pixels. The CDU produces contone data for 8lines in parallel, while the HCU processes data linearly across a lineon a line by line basis. The contone data is decoded only once and thenbuffered in DRAM. This means we require two sets of 8 buffer-lines—oneset of 8 buffer lines is being consumed by the CFU while the other setof 8 buffer lines is being generated by the CDU.

[2826] The buffer requirement can be reduced by using a 1.5 bufferingscheme, where the CDU fills 8 lines while the CFU consumes 4 lines. Thebuffer space required is a minimum of 12 line stores per color, for atotal space of 108 KBytes¹⁰. A circular buffer scheme is employedwhereby the CDU may only begin to write a line of JPEG blocks (equals 8lines of contone data) when there are 8-lines free in the buffer. Oncethe full 8 lines have been written by the CDU, the CFU may now begin toread them on a line by line basis.

[2827] This reduction in buffering comes with the cost of an increasedpeak bandwidth requirement for the CDU write access to DRAM. The CDUmust be able to write the decompressed contone at twice the rate atwhich the CFU reads the data. To allow for trade-offs to be made betweenpeak bandwidth and amount of storage, the size of the circular buffer isconfigurable. For example, if the circular buffer is configured to be 16lines it behaves like a double-buffer scheme where the peak bandwidthrequirements of the CDU and CFU are equal. An increase over 16 linesallows the CDU to write ahead of the CFU and provides it with a marginto cope with very poor local compression ratIOs in the image.

[2828] SoPEC should also provide support for A3 printing and printing atresolutions above 267 ppi. This increases the storage requirement forthe decompressed contone data (buffer) in DRAM. Table 143 gives thestorage requirements for the decompressed contone data at some samplecontone resolutions for different page sizes. It assumes 4 color planesof contone data and a 1.5 buffering scheme. TABLE 143 Storagerequirements for decompressed contone data (buffer) Contone PixelsStorage Page resolution Scale per required size (ppi) factor^(a) line (kBytes) A4/ 267 6 2304 108^(d) Letter^(b) 400 4 3456 162 800 2 6912 324A3^(c) 267 6 3248 152.25 400 4 4872 228.37 800 2 9744 456.75

[2829] 22.3 Decompression Performance Requirements

[2830] The JPEG decoder core can produce a single color pixel everysystem clock (pclk) cycle, making it capable of decoding at a peakoutput rate of 8 bits/cycle. SoPEC processes 1 dot (bi-level in 6colors) per system clock cycle to achieve a print speed of 1 side per 2seconds for full. bleed A4/Letter printing. The CFU replicates pixels ascale factor (SF) number of times in both the horizontal and verticaldirections to convert the final output to 1600 ppi. Thus the CFUconsumes a 4 color pixel (32 bits) every SF x SF cycles. The 1.5buffering scheme described in section 22.2 on page 327 means that theCDU must write the data at twice this rate. With support for 4 colors at267 ppi, the decompression output bandwidth requirement is 1.78bits/cycle¹¹.

[2831] The JPEG decoder is fed directly from the main memory via theDRAM interface. The amount of compression determines the input bandwidthrequirements for the CDU. As the level of compression increases, thebandwidth decreases, but the quality of the final output image can alsodecrease. Although the average compression ratio for contone data isexpected to be 10:1, the average bandwidth allocated to the CDU allowsfor a local minimum compression ratio of 5:1 over a single line of JPEGblocks. This equates to a peak input bandwidth requirement of 0.36bits/cycle for 4 colors at 267 ppi, full bleed A4/Letter printing at 1side per 2 seconds.

[2832] Table 144 gives the decompression output bandwidth requirementsfor different resolutions of contone data to meet a print speed of 1side per 2 seconds. Higher resolution requires higher bandwidth andlarger storage for decompressed contone data in DRAM. A resolution of400 ppi contone data in 4 colors requires 4 bits/cycle¹², which ispractical using a 1.5 buffering scheme. However, a resolution of 800 ppiwould require a double buffering scheme (16 lines) so the CDU only hasto match the CFU consumption rate. In this case the decompression outputbandwidth requirement is 8 bits/cycle¹³, the limiting factor being theoutput rate of the JPEG decoder core. TABLE 144 CDU performancerequirements for full bleed A4/Letter printing at 1 side per 2 seconds.Decompression Contone output bandwidth resolution Scale requirement(ppi) factor (bits/cycle)^(a) 267 6 1.78 400 4 4 800 2 8^(b)

[2833] 22.4 Data Flow

[2834]FIG. 136 shows the general data flow for contone data—compressedcontone planes are read from DRAM by the CDU, and the decompressedcontone data is written to the 12-line circular buffer in DRAM. The linebuffers are subsequently read by the CFU.

[2835] The CDU allows the contone data to be passed directly on, whichwill be the case if the color represented by each color plane in theJPEG image is an available ink. For example, the four colors may be C,M, Y, and K, directly represented by CMYK inks. The four colors mayrepresent gold, metallic green etc. for multi-SoPEC printing with exactcolors.

[2836] However JPEG produces better compression ratIOs for a givenvisible quality when luminance and chrominance channels are separated.With CMYK, K can be considered to be luminance, but C, M, and Y eachcontain luminance information, and so would need to be compressed withappropriate luminance tables. We therefore provide the means by whichCMY can be passed to SoPEC as YCrCb. K does not need color conversion.When being JPEG compressed, CMY is typically converted to RGB, then toYCrCb and then finally JPEG compressed. At decompression, the YCrCb datais obtained and written to the decompressed contone store by the CDU.This is read by the CFU where the YCrCb can then be optionally colorconverted to RGB, and finally back to CMY.

[2837] The external RIP provides conversion from RGB to YCrCb,specifically to match the actual hardware implementation of the inversetransform within SoPEC, as per CCIR 601-2 [24] except that Y, Cr and Cbare normalized to occupy all 256 levels of an 8-bit binary encoding.

[2838] The CFU provides the translation to either RGB or CMY. RGB isincluded since it is a necessary step to produce CMY, and some printersincrease their color gamut by including RGB inks as well as CMYK.

[2839] 22.5 Implementation

[2840] A block diagram of the CDU is shown in FIG. 137.

[2841] All output signals from the CDU (cdu_cfu_wradv8line,cdu_finishedband, cdu_icu_jpegerror, and control signals to the DIU)must always be valid after reset. If the CDU is not currently decoding,cdu_cfu_wradv8line, cdu_finishedband and cdu_icu_jpegerror will alwaysbe 0.

[2842] The read control unit is responsible for keeping the JPEGdecoder's input FIFO full by reading compressed contone bytestream fromexternal DRAM via the DIU, and produces the cdu_finishedband signal. Thewrite control unit accepts the output from the JPEG decoder a half JPEGblock (32 bytes) at a time, writes it into a double-buffer, and writesthe double buffered decompressed half blocks to DRAM via the DIU,interacting with the CFU in order to share DRAM buffers.

[2843] 22.5.1 Definitions of I/O TABLE 145 CDU port list and descriptionPort name Pins I/O Description Clocks and reset Pclk 1 In System clock.Jclk 1 In Gated version of system clock used to clock the JPEG decodercore and logic at the output of the core. Allows for stalling of theJPEG core at a pixel sample boundary. jclk_enable 1 Out Gating signalfor jclk. prst_n 1 In System reset, synchronous active low. jrst_n 1 InReset for jclk domain, synchronous active low. PCU interface pcu_cdu_sel1 In Block select from the PCU. When pcu_cdu_sel is high both pcu_adrand pcu_dataout are valid. pcu_rwn 1 In Common read/not-write signalfrom the PCU. pcu_adr[7:2] 6 In PCU address bus. Only 6 bits arerequired to decode the address space for this block. pcu_dataout[31:0]32 In Shared write data bus from the PCU. cdu_pcu_rdy 1 Out Ready signalto the PCU. When cdu_pcu_rdy is high it indicates the last cycle of theaccess. For a write cycle this means pcu_dataout has been registered bythe block and for a read cycle this means the data on cdu_pcu_datain isvalid. cdu_pcu_datain[31:0] 32 Out Read data bus to the PCU. DIU readinterface cdu_diu_rreq 1 Out CDU read request, active high. A readrequest must be accompanied by a valid read address. diu_cdu_rack 1 InAcknowledge from DIU, active high. Indicates that a read request hasbeen accepted and the new read address can be placed on the address bus,cdu_diu_radr. cdu_diu_radr[21:5] 17 Out CDU read address. 17 bits wide(256-bit aligned word). diu_cdu_rvalid 1 In Read data valid, activehigh. Indicates that valid read data is now on the read data bus,diu_data. diu_data[63:0] 64 In Read data from DRAM. DIU write interfacecdu_diu_wreq 1 Out CDU write request, active high. A write request mustbe accompanied by a valid write address and valid write data.diu_cdu_wack 1 In Acknowledge from DIU, active high. Indicates that awrite request has been accepted and the new write address can be placedon the address bus, cdu_diu_wadr. cdu_diu_wadr[21:3] 19 Out CDU writeaddress. 19 bits wide (64-bit aligned word). cdu_diu_wvalid 1 Out Writedata valid, active high. Indicates that valid data is now on the writedata bus, cdu_diu_data. cdu_diu_data[63:0] 64 Out Write data bus. CFUinterface cfu_cdu_rdadvline 1 In Read line pulse, active high. Indicatesthat the CFU has finished reading a line of decom- pressed contone datato the circular buffer in DRAM and that line of the buffer is now free.cdu_cfu_linestore_rdy 1 Out Indicates if the contone line store has 1 ormore lines available to read by the CFU. TE and LBD interfacecdu_start_of_bandstore[21:5] 17 Out Points to the 256-bit word thatdefines the start of the memory area allocated for page bands.cdu_end_of_bandstore[21:5] 17 Out Points to the 256-bit word thatdefines the last address of the memory area allocated for page bands.ICU interface cdu_finishedband 1 Out CDU's finishedBand flag, activehigh. Interrupt to the CPU to indicate that the CDU has finishedprocessing a band of compressed contone data in DRAM and that area ofDRAM is now free. This signal goes to both the interrupt con- trollerand the PCU. cdu_icu_jpegerror 1 Out Active high interrupt indicating anerror has occurred in the JPEG decoding process and decompression hasstopped. A reset of the CDU must be performed to clear this interrupt.

[2844] 22.5.2 Configuration Registers

[2845] The configuration registers in the CDU are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for the description ofthe protocol and timing diagrams for reading and writing registers inthe CDU. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for theCDU. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of cdu_pcu_datain.

[2846] Since the CDU, LBD and TE all access the page band store, theyshare two registers that enable sequential memory accesses to the pageband stores to be circular in nature. Table 146 lists these tworegisters. TABLE 146 Registers shared between the CDU, LBD, and TEAddress Register Value on (CDU_base+) name #bits reset description Setupregisters (remain constant during the processing of multiple bands) 0x80StartOfBandStore[21:5] 17 0x0_0000 Points to the 256-bit word thatdefines the start of the memory area allocated for page bands. Circularaddress generation wraps to this start address. 0x84EndOfBandStore[21:5] 17 0x1_3FFF Points to the 256-bit word that definesthe last address of the memory area allocated for page bands. If thecurrent read address is from this address, then instead of adding 1 tothe current address, the current address will be loaded from theStartOfBandStore register.

[2847] The software reset logic should include a circuit to ensure thatboth the pclk and jclk domains are reset regardless of the state of thejclk_enable when the reset is initiated.

[2848] The CDU contains the following additional registers: TABLE 147CDU registers Address Register Value on (CDU_base+) name #bits resetDescription Control registers 0x00 Reset 1 0x1 A write to this registercauses a reset of the CDU. This terminates all internal operationswithin the CS6150. All configuration data previously loaded into thecore except for the tables is deleted. 0x04 Go 1 0x0 Writing 1 to thisregister starts the CDU. Writing 0 to this register halts the CDU. WhenGo is deasserted the state-machines go to their idle states but allcounters and configuration registers keep their values. When Go isasserted all counters are reset, but configuration registers keep theirvalues (i.e. they don't get reset). NextBandEnable is cleared when Go isasserted. The CPU must be started before the CDU is started. Go mustremain low for at least 384 jclk cycles after a hardware reset (prst_n =0) to allow the JPEG core to complete its memory itnitialisationsequence. This register can be read to determine if the CDU is running(1 - running, 0 - stopped). Setup registers 0x0C NumLinesAvail 7 0x0 Thenumber of image lines of data that there is space available for in thedecompressed data buffer in DRAM. If this drops < 8 the CDU will stall.In normal opera- tion this value will start off atNumBuffLines and willbe decremented by 8 whenever the CDU writes a line of JPEG blocks (8lines of data) to DRAM and incremented by 1 whenever the CPU reads aline of data from DRAM. NumLinesAvail can be overwritten by the CPU toprevent the CDU from stalling. 0x10 MaxPlane 2 0x0 Defines the number ofcontone planes - 1. For example, this will be 0 for K (greyscaleprinting), 2 for CMY, and 3 for CMYK. 0x14 MaxBlock 13 0x000 Number ofJPEG MCUs (or JPEG block equiva- lents, i.e. 8 × 8 bytes) in a line - 1.0x18 BuffStartAdr[21:7] 15 0x0000 Points to the start of thedecompressed contone circular buffer in DRAM, aligned to a half JPEGblock boundary. A half JPEG block consists of 4 words of 256-bits,enough to hold 32 con- tone pixels in 4 colors, i.e. half a JPEG block.0x1C BuffEndAdr[21:7] 15 0x0000 Points to the start of the last halfJPEG block at the end of the decompressed contone circular buffer inDRAM, aligned to a half JPEG block boundary. A half JPEG block consistsof 4 words of 256-bits, enough to hold 32 con- tone pixels in 4 colors,i.e. half a JPEG block. 0x20 NumBuffLines[6:2] 5 0x03 Defines size ofbuffer in DRAM in terms of the number of decompressed contone lines. Thesize of the buffer should be a multiple of 4 lines with a minimum sizeof 8 lines. 0x24 BypassJpg 1 0x0 Determines whether or not the JPEGdecoder will be bypassed (and hence pixels are copied directly frominput to output) 0 - don't bypass, 1 - bypass Should not be changedbetween bands. 0x30 NextBandCurrSourceAdr[21:5] 17 0x0_0000 The 256-bitaligned word address containing the start of the next band of compressedcontone data in DRAM. This value is copied to CurrSourceAdr when bothDoneBand is 1 and NextBandEnable is 1, or when Go transitions from 0to 1. 0x34 NextBandEndSourceAdr[21:3] 19 0x0_0000 The 64-bit alignedword address contain- ing the last bytes of the next band of com-pressed contone data in DRAM. This value is copied to EndSourceAdrm whenwhen both DoneBand is 1 and NextBandEnable is 1, or when Go transi-tions from 0 to 1. 0x38 NextBandValidBytesLastFetch 3 0x0 Indicates thenumber of valid bytes - 1 in the last 64-bit fetch of the next band ofcompressed contone data from DRAM. eg 0 implies bits 7:0 are valid, 1implies bits 15:0 are valid, 7 implies all 63:0 bits are valid etc. Thisvalue is copied to ValidBytesLastFetch when both DoneBand is 1 andNextBandEnable is 1, or when Go transi- tions from 0 to 1. 0x3CNextBandEnable 1 0x0 When NextBandEnable is 1 and DoneBand is 1NextBandCurrSourceAdr is copied to CurrSourceAdr, NextBandEndSourceAdris copied to EndSourceAdr NextBandValidBytesLastFetch is copied toValidBytesLastFetch DoneBand is cleared, NextBandEnable is cleared.NextBandEnable is cleared when Go is asserted. Note that DoneBand getscleared regardless of the state of Go. Read-only registers 0x40 DoneBand1 0x0 Specifies whether or not the current band has finished loadinginto the local FIFO. It is cleared to 0 when Go transitions from 0 to 1.When the last of the compressed contone data for the band has beenloaded into the local FIFO, the cdu_finishedband signal is given out andthe DoneBandflag is set. If NextBandEnable is 1 at this time thenCurrSourceAdr, EndSourceAdr and ValidBytesLastFetch are updated with thevalues for the next band and DoneBand is cleared. Processing of the nextband starts immediately. If NextBandEnable is 0 then the remainder ofthe CDU will continue to run, decompressing the data already loaded,while the read control unit waits for NextBandEnable to be set before itrestarts. 0x44 CurrSourceAdr[21:5] 17 0x0_0000 The current 256-bitaligned word address within the current band of compressed contone datain DRAM. 0x48 EndSourceAdr[21:3] 19 0x0_0000 The 64-bit aligned wordaddress containing the last bytes of the current band of com- pressedcontone data in DRAM. 0x4C ValidBytesLastFetch 3 0x00 Indicates thenumber of valid bytes - 1 in the last 64-bit fetch of the current bandof compressed contone data from DRAM. eg 0 implies bits 7:0 are valid, 1implies bits 15:0 are valid, 7 implies all 63:0 bits are valid etc. JPEGdecoder core setup registers 0x50 JpgDecMask 5 0x00 As segments are de-coded they can also be output on the DecJpg (JpgDecHdr) port with theuser selecting the segments for output by setting bits in the JpgDecMaskport as follows: 4 SOF + SOS + DNL 3 COM + APP 2 DRI 1 DQT 0 DHT If anyone of the bits of JpgDecMask is assert- ed then the SOI and EOI markersare also passed to the DecJpg port. 0x54 JpgDecTType 1 0x0 Test typeselector: 0 - DCT coefficients displayed on JpgDecTdata 1 - QDCTcoefficient displayed on JpgDecTdata 0x58 JpgDecTestEn 1 0x0 Signalwhich causes the memories to be bypassed for test purposes. 0x5CJpgDecPType 4 0x0 Signal specifying parameters to be placed on portJpgDecPValue (See Table). JPEG decoder core read-only status registers0x60 JpgDecHdr 8 0x00 Selected header segments from the JPEG stream thatis currently being decoded. Segments selected using JpgMask. 0x64JpgDecTData 13 0x0000 12 - TSOS output of CS1650, indicates the firstoutput byte of the first 8 × 8 block of the test data. 11 - TSOB outputof CS1650, indicates the first output byte of each 8 × 8 block of testdata. 10-0 - 11-bit output test data port - dis- plays DCT coefficientsor quantized coefficients depending on value of JpgDecTType. 0x68JpgDecPValue 16 0x0000 Decoding parameter bus which enables variousparameters used by the core to be read. The data available on the PValueport is for information only, and does not contain control signals forthe decoder core. 0x6C JpgDecStatus 24 0x00_0000 Bit 23 - jpg_core_stall(if set, indicates that the JPEG core is stalled by gating of jclk asthe output JPEG halfblock double-buffers of the CDU are full) Bit 22 -pix_out_valid (This signal is an output from the JPEG decoder core andis asserted when a pixel is being output Bits 21-16 - fifo_contents(Number of bytes in compressed contone FIFO at the input of CDU whichfeeds the JPEG decoder core) Bits 15-0 are JPEG decoder status outputsfrom the CS6150 (see Table for description of bits).

[2849] 22.5.3 Typical Operation

[2850] The CDU should only be started after the CFU has been started.

[2851] For the first band of data, users set up NextBandCurrSourceAdr,NextBandEndSourceAdr, NextBandValidBytesLastFetch, and the variousMaxPlane, MaxBlock, BuffStartBlockAdr, BuffEndBlockAdr and NumBuffLines.Users then set the CDU's Go bit to start processing of the band. Whenthe compressed contone data for the band has finished being read in, thecdu_finishedband interrupt will be sent to the PCU and CPU indicatingthat the memory associated with the first band is now free. Processingcan now start on the next band of contone data.

[2852] In order to process the next band NextBandCurrSourceAdr,NextBandEndSourceAdr and NextBandValidBytesLastFetch need to be updatedbefore finally writing a 1 to NextBandEnable. There are 4 mechanisms forrestarting the CDU between bands:

[2853] a. cdu_finishedband causes an interrupt to the CPU. The CDU willhave set its DoneBand bit. The CPU reprograms the NextBandCurrSourceAdr,NextBandEndSourceAdr and NextBandValidBytesLastFetch registers, and setsNextBandEnable to restart the CDU.

[2854] b. The CPU programs the CDU's NextBandCurrSourceAdr,NextBandCurrEndAdr and NextBandValidBytesLastFetch registers and setsthe NextBandEnable bit before the end of the current band. At the end ofthe current band the CDU sets DoneBand. As NextBandEnable is already 1,the CDU starts processing the next band immediately.

[2855] c. The PCU is programmed so that cdu_finishedband triggers thePCU to execute commands from DRAM to reprogram theNextBandCurrSourceAdr, NextBandEndSourceAdr andNextBandValidBytesLastFetch registers and set the NextBandEnable bit tostart the CDU processing the next band. The advantage of this scheme isthat the CPU could process band headers in advance and store the bandcommands in DRAM ready for execution.

[2856] d. This is a combination of b and c above. The PCU (rather thanthe CPU in b) programs the CDU's NextBandCurrSourceAdr,NextBandCurrEndAdr and NextBandValidBytesLastFetch registers and setsthe NextBandEnable bit before the end of the current band. At the end ofthe current band the CDU sets DoneBand and pulses cdu_finishedband. AsNextBandEnable is already 1, the CDU starts processing the next bandimmediately. Simultaneously, cdu_finishedband triggers the PCU to fetchcommands from DRAM. The CDU will have restarted by the time the PCU hasfetched commands from DRAM. The PCU commands program the CDU's next bandshadow registers and sets the NextBandEnable bit.

[2857] If an error occurs in the JPEG stream, the JPEG decoder willsuspend its operation, an error bit will be set in the JpgDecStatusregister and the core will ignore any input data and await a resetbefore starting decoding again. An interrupt is sent to the CPU byasserting cdu_icu_jpegerror and the CDU should then be reset by means ofa write to its Reset register before a new page can be printed.

[2858] 22.5.4 Read Control Unit

[2859] The read control unit is responsible for reading the compressedcontone data and passing it to the JPEG decoder via the FIFO. Thecompressed contone data is read from DRAM in single 256-bit accesses,receiving the data from the DIU over 4 clock cycles (64-bits per cycle).The protocol and timing for read accesses to DRAM is described insection 20.9.1 on page 240. Read accesses to DRAM are implemented bymeans of the state machine described in FIG. 138. All counters and flagsshould be cleared after reset. When Go transitions from 0 to 1 allcounters and flags should take their initial value. While the Go bit isset, the state machine relies on the DoneBand bit to tell it whether toattempt to read a band of compressed contone data. When DoneBand is set,the state machine does nothing. When DoneBand is clear, the statemachine continues to load data into the JPEG input FIFO up to 256-bitsat a time while there is space available in the FIFO. Note that thestate machine has no knowledge about numbers of blocks or numbers ofcolor planes—it merely keeps the JPEG input FIFO full by consecutivereads from DRAM. The DIU is responsible for ensuring that DRAM requestsare satisfied at least at the peak DRAM read bandwidth of 0.36bits/cycle (see section 22.3 on page 329).

[2860] A modulo 4 counter, rd_count, is use to count each of the 64-bitsreceived in a 256-bit read access. It is incremented wheneverdiu_cdu_rvalid is asserted. As each 64-bit value is returned, indicatedby diu_cdu_rvalid being asserted, curr_source_adr is compared to bothend_source_adr and end_of_bandstore:

[2861] If {curr_source_adr rd_count} equals end_source_adr, theend_of_band control signal sent to the FIFO is 1 (to signify the end ofthe band), the finishedCDUBand signal is output, and the DoneBand bit isset. The remaining 64-bit values in the burst from the DIU are ignored,i.e. they are not written into the FIFO.

[2862] If rd_count equals 3 and {curr_source_adr, rd_count} does notequal end_source_adr, then curr_source_adr is updated to be eitherstart_of_bandstore or curr_source_adr+1, depending on whethercurr_source_adr also equals end_of_bandstore. The end_of_band controlsignal sent to the FIFO is 0.

[2863] curr_source_adr is output to the DIU as cdu_diu_radr.

[2864] A count is kept of the number of 64-bit values in the FIFO. Whendiu_cdu_rvalid is 1 and ignore_data is 0, data is written to the FIFO byasserting FifoWr, and fifo_contents[3:0] and fifo_wr_adr[2:0] are bothincremented.

[2865] When fifo_contents[3:0] is greater than 0, jpg_in_strb isasserted to indicate that there is data available in the FIFO for theJPEG decoder core. The JPEG decoder core asserts jpg_in_rdy when it isready to receive data from the FIFO. Note it is also possible to bypassthe JPEG decoder core by setting the BypassJpg register to 1. In thiscase data is sent directly from the FIFO to the half-blockdouble-buffer. While the JPEG decoder is not stalled (jpg_core_stallequal 0), and jpg_in_rdy (or bypass_jpg) and jpg_in_strb are both 1, abyte of data is consumed by the JPEG decoder core. fifo_rd_adr[5:0] isthen incremented to select the next byte. The read address is bytealigned, i.e. the upper 3 bits are input as the read address for theFIFO and the lower 3 bits are used to select a byte from the 64 bits. Iffifo_rd_adr[2:0]=111 then the next 64-bit value is read from the FIFO byasserting fifo_rd, and fifo_contents[3:0] is decremented.

[2866] 22.5.5 Compressed Contone FIFO

[2867] The compressed contone FIFO conceptually is a 64-bit input, and8-bit output FIFO to account for the 64-bit data transfers from the DIU,and the 8-bit requirement of the JPEG decoder.

[2868] In reality, the FIFO is actually 8 entries deep and 65-bits wide(to accommodate two 256-bit accesses), with bits 63-0 carrying data, andbit 64 containing a 1-bit end_of_band flag. Whenever 64-bit data iswritten to the FIFO from the DIU, an end of band flag is also passed infrom the read control unit. The end_of_band bit is 1 if this is the lastdata transfer for the current band, and 0 if it is not the lasttransfer. When end_of_band=1 during an input, the ValidBytesLastFetchregister is also copied to an image version of the same.

[2869] On the JPEG decoder side of the FIFO, the read address is bytealigned, i.e. the upper 3 bits are input as the read address for theFIFO and the lower 3 bits are used to select a byte from the 64 bits(1st byte corresponds to bits 7-0, second byte to bits 15-8 etc.). Ifbit 64 is set on the read, bits 63-0 contain the end of the bytestreamfor that band, and only the bytes specified by the image ofValidBytesLastFetch are valid bytes to be read and presented to the JPEGdecoder. Note that ValidBytesLastFetch is copied to an image register asit may be possible for the CDU to be reprogrammed for the next bandbefore the previous band's compressed contone data has been read fromthe FIFO (as an additional effect of this, the CDU has a non-problematiclimitation in that each band of contone data must be more than4×64-bits, or 32 bytes, in length).

[2870] 22.5.6 CS6150 JPEG Decoder

[2871] JPEG decoder functionality is implemented by means of a modifiedversion of the Amphion CS6150 JPEG decoder core. The decoder is run at anominal clock speed of 160 MHz. (Amphion have stated that the CS6150JPEG decoder core can run at 185 MHz in 0.13 um technology). The core isclocked by jclk which a gated version of the system clock pclk. Gatingthe clock provides a mechanism for stalling the JPEG decoder on a singlecolor pixel-by-pixel basis. Control of the flow of output data is alsoprovided by the PixOutEnab input to the JPEG decoder. However, this onlyallows stalling of the output at a JPEG block boundary and isinsufficient for SoPEC. Thus gating of the clock is employed andPixOutEnab is instead tied high.

[2872] The CS6150 decoder automatically extracts all relevant parametersfrom the JPEG bytestream and uses them to control the decoding of theimage. The JPEG bytestream contains data for the Huffman tables,quantization tables, restart interval definition and frame and scanheaders. The decoder parses and checks the JPEG bytestream automaticallydetecting and processing all the JPEG marker segments. After identifyingthe JPEG segments the decoder re-directs the data to the appropriateunits to be stored or processed as appropriate. Any errors detected inthe bytestream, apart from those in the entropy coded segments, aresignalled and, if an error is found, the decoder stops reading the JPEGstream and waits to be reset.

[2873] JPEG images must have their data stored in interleaved formatwith no subsampling. Images longer than 65536 lines are allowed: thesemust have an initial imageHeight of 0. If the image has a Define NumberLines (DNL) marker at the end (normally necessary for standard JPEG, butnot necessary for SoPEC's version of the CS6150), it must be equal tothe total image height mod 64 k or an error will be generated.

[2874] See the CS6150 Databook [21] for more details on how the core isused, and for timing diagrams of the interfaces. Note that [21] does notdescribe the use of the DNL marker in images of more than 64 k lineslength as this is a modification to the core.

[2875] The CS6150 decoder can be bypassed by setting the BypassJpgregister. If this register is set, then the data read from DRAM must bein the same format as if it was produced by the JPEG decoder: 8×8 blocksof pixels in the correct color order. The data is uncompressed and istherefore lossless.

[2876] The following subsections describe the means by which the CS6150internals can be made visible.

[2877] 22.5.6.1 JPEG Decoder Reset

[2878] The JPEG decoder has 2 possible types of reset, an asynchronousreset and a synchronous clear. In SoPEC the asynchronous reset isconnected to the hardware synchronous reset of the CDU and can beactivated by any hardware reset to SoPEC (either from external pin orfrom any of the wake-up sources, e.g. USB activity, Wake-up registertimeout) or by resetting the PEP section (ResetSection register in theCPR block).

[2879] The synchronous clear is connected to the software reset of theCDU and can be activated by the low to high transition of the Goregister, or a software reset via the Reset register.

[2880] The 2 types of reset differ, in that the asynchronous reset,resets the JPEG core and causes the core to enter a memoryinitialization sequence that takes 384 clock cycles to complete afterthe reset is deasserted. The synchronous clear resets the core, butleaves the memory as is. This has some implications for programming theCDU.

[2881] In general the CDU should not be started (i.e. setting Go to 1)until at least 384 cycles after a hardware reset. If the CDU is startedbefore then, the memory initialization sequence will be terminatedleaving the JPEG core memory in an unknown state. This is allowed if thememory is to be initialized from the incoming JPEG stream.

[2882] 22.5.6.2 JPEG Decoder Parameter Bus

[2883] The decoding parameter bus JpgDecPValue is a 16-bit port used tooutput various parameters extracted from the input data stream andcurrently used by the core. The 4-bit selector input (JpgDecPType)determines which internal parameters are displayed on the parameter busas per Table 148. The data available on the PValue port does not containcontrol signals used by the CS6150. TABLE 148 Parameter bus definitionsPType Output orientation PValue 0x0 FY[15:0] FY: number of lines inframe 0x1 FX[15:0] FX: number of columns in frame 0x2 00_YMCU[13:0]YMCU: number of MCUs in Y direction of the current scan 0x300_XMCU[13:0] XMCU: number of MCUs in X direction of the current scan0x4 Cs0[7:0]_Tq0[1:0]_(—) Cs0: identifier for the first V0[2:0]_H0[2:0]scan component Tq0: quantization table identi- fier for the first scancompo- nent V0: vertical sampling factor for the first scan component.Values = 1-4 H0: horizontal sampling factor for the first scancomponent. Values = 1-4 0x5 Cs1[7:0]_Tq1[1:0]_(—) Cs1, Tq1, V1 and H1for the V1[2:0]_H1[2:0] second scan component. V1, H1 undefined if NS <2 0x6 Cs2[7:0]_Tq2[1:0]_(—) Cs2, Tq2, V2 and H2 for the V2[2:0]_H2[2:0]second scan component. V2, H2 undefined if NS < 3 0x7Cs3[7:0]_Tq3[1:0]_(—) Cs3, Tq3, V3 and H3 for the V3[2:0]_H3[2:0] secondscan component. V3, H3 undefined if NS < 4 0x8 CsH[15:0] CsH: no. ofrows in current scan 0x9 CsV[15:0] CsV: no. of columns in current scan0xA DRI[15:0] DRI: restart interval 0xB 000_HMAX[2:0]_(—) HMAX: maximalhorizontal VMAX[2:0]_(—) sampling factor in frame MCUBLK[3:0]_NS[2:0]VMAX: maximal vertical sampling factor in frame MCUBLK: number of blocksper MCU of the current scan, from 1 to 10 NS: number of scan componentsin current scan, 1-4

[2884] 22.5.6.3 JPEG Decoder Status Register

[2885] The status register flags indicate the current state of theCS6150 operation. When an error is detected during the decoding process,the decompression process in the JPEG decoder is suspended and aninterrupt is sent to the CPU by asserting cdu_icu_jpegerror (generatedfrom DecError). The CPU can check the source of the error by reading theJpgDecStatus register. The CS6150 waits until a reset process is invokedby asserting the hard reset prst_n or by a soft reset of the CDU. Theindividual bits of JpgDecStatus are set to zero at reset and active highto indicate an error condition as defined in Table 149.

[2886] Note: A DecHfError will not block the input as the core will tryto recover and produce the correct amount of pixel data. The DecHfErroris cleared automatically at the start of the next image and so nointervention is required from the user. If any of the other errors occurin the decode mode then, following the error cancellation, the core willdiscard all input data until the next Start Of Image (SOI) withouttriggering any more errors.

[2887] The progress of the decoding can be monitored by observing thevalues of TbIDef, IDctInProg, DecInProg and JpgInProg. TABLE 149 JPEGdecoder status register definitions Bit Name Description 15-12TblDef[7:4] Indicates the number of Huffman tables defined, 1 bit/table.11-8  TblDef[3:0] Indicates the number of quantization tables defined, 1bit/table. 7 DecHfError Set when an undefined Huffman table symbol isreferenced during decoding. 6 CtlError Set when an invalid SOF parameteror an invalid SOS parameter is detected. Also set when there is amismatch between the DNL segment input to the core and the number oflines in the input image which have already been decoded. Note thatSoPEC's implementation of the CS6150 does not require a final DNL whenthe initial setting for ImageHeight is 0. This is to allow images longerthan 64k lines. 5 HtError Set when an invalid DHT segment is detected. 4QtError Set when an invalid DQT segment is detected. 3 DecError Set whenanything other than a JPEG marker is input. Set when any ofDecFlags[6:4] are set. Set when any data other than the SOI marker isdetected at the start of a stream. Set when any SOF marker is detectedother than SOF0. Set if incomplete Huffman or quantization definition isdetected. 2 IDctInProg Set when IDCT starts processing first data of ascan. Cleared when IDCT has processed the last data of a scan. 1DecInProg For each scan this signal is asserted after the SigSOS (Startof Scan Segment) signal has been output from the core and is deassertedwhen the decoding of a scan is complete. It indicates that the core isin the decoding state. 0 JpgInProg Set when core starts to process inputdata (JpgIn) and de-asserted when decoding has been completed i.e. whenthe last pixel of last block of the image is output.

[2888] 22.5.7 Half-Block Buffer Interface

[2889] Since the CDU writes 256 bits (4×64 bits) to memory at a time, itrequires a double-buffer of 2×256 bits at its output. This isimplemented in an 8×64 bit FIFO. It is required to be able to stall theJPEG decoder core at its output on a half JPEG block boundary, i.e.after 32 pixels (8 bits per pixel). We provide a mechanism for stallingthe JPEG decoder core by gating the clock to the core (with jclk_enable)when the FIFO is full. The output FIFO is responsible for providing twobuffered half JPEG blocks to decouple JPEG decoding (read control unit)from writing those JPEG blocks to DRAM (write control unit). Data comingin is in 8-bit quantities but data going out is in 64-bit quantities fora single color plane.

[2890] 22.5.8 Write Control Unit

[2891] A line of JPEG blocks in 4 colors, or 8 lines of decompressedcontone data, is stored in DRAM with the memory arrangement as shownFIG. 139. The arrangement is in order to optimize access for reads bywriting the data so that 4 color components are stored together in each256-bit DRAM word.

[2892] The CDU writes 8 lines of data in parallel but stores the first 4lines and second 4 lines separately in DRAM. The write sequence for asingle line of JPEG 8×8 blocks in 4 colors, as shown in FIG. 139, is asfollows below and corresponds to the order in which pixels are outputfrom the JPEG decoder core

[2893] block 0, color 0, line 0 in word p bits 63-0, line 1 in word p+1bits 63-0,

[2894] line 2 in word p+2 bits 63-0, line 3 in word p+3 bits 63-0,

[2895] block 0,color 0, line 4 in word q bits 63-0, line 5 in word q+1bits 63-0,

[2896] line 6 in word q+2 bits 63-0, line 7 in word q+3 bits 63-0,

[2897] block 0, color 1, line 0 in word p bits 127-64, line 1 in wordp+1 bits 127-64,

[2898] line 2 in word p+2 bits 127-64, line 3 in word p+3 bits 127-64,

[2899] block 0, color 1, line 4 in word q bits 127-64, line 5 in wordq+1 bits 127-64,

[2900] line 6 in word q+2 bits 127-64, line 7 in word q+3 bits 127-64,

[2901] repeat for block 0 color 2,block 0 color 3.

[2902] block 1, color 0, line 0 in word p+4 bits 63-0, line 1 in wordp+5 bits 63-0, etc. . . .

[2903] block N, color 3, line 4 in word q+4n bits 255-192, line 5 inword q+4n+1 bits 255-192,

[2904] line 6 in word q+4n+2 bits 255-192,

[2905] line 7 in word q+4n+3 bit 255-192

[2906] In SoPEC data is written to DRAM 256 bits at a time. The DIUreceives a 64-bit aligned address from the CDU, i.e. the lower 2 bitsindicate which 64-bits within a 256-bit location are being written to.With that address the DIU also receives half a JPEG block (4 lines) in asingle color, 4×64 bits over 4 cycles. All accesses to DRAM must bepadded to 256 bits or the bits which should not be written are maskedusing the individual bit write inputs of the DRAM. When writingdecompressed contone data from the CDU, only 64 bits out of the 256-bitaccess to DRAM are valID, and the remaining bits of the write are maskedby the DIU. This means that the decompressed contone data is written toDRAM in 4 back-to-back 64-bit write masked accesses to 4 consecutive256-bit DRAM locations/words.

[2907] Writing of decompressed contone data to DRAM is implemented bythe state machine in FIG. 140. The CDU writes the decompressed contonedata to DRAM half a JPEG block at a time, 4×64 bits over 4 cycles. Allcounters and flags should be cleared after reset. When Go transitionsfrom 0 to 1 all counters and flags should take their initial value.While the Go bit is set, the state machine relies on thehalf_block_ok_to_read and line_store_ok_to_write flags to tell itwhether to attempt to write a half JPEG block to DRAM. Once thehalf-block buffer interface contains a half JPEG block, the statemachine requests a write access to DRAM by asserting cdu_diu_wreq andproviding the write address, corresponding to the first 64-bit value tobe written, on cdu_diu_wadr (only the address the first 64-bit value ineach access of 4×64 bits is issued by the CDU. The DIU can generate theaddresses for the second, third and fourth 64-bit values). The statemachine then waits to receive an acknowledge from the DIU beforeinitiating a read of 4×64-bit values from the half-block bufferinterface by asserting rd_adv for 4 cycles. The output cdu_diu_wvalid isasserted in the cycle after rd_adv to indicate to the DIU that validdata is present on the cdu_diu_data bus and should be written to thespecified address in DRAM. A rd_adv_half block pulse is then sent to thehalf-block buffer interface to indicate that the current read buffer hasbeen read and should now be available to be written to again. The statemachine then returns to the request state.

[2908] The pseudocode below shows how the write address is calculated ona per clock cycle basis.

[2909] Note counters and flags should be cleared after reset. When Gotransitions from 0 to 1 all counters and flags should be cleared andlwr_halfblock_adr gets loaded with buff_start_adr and upr_halfblock_adrgets loaded with buff_start_adr+max_block+1. // assign write addressoutput to DRAM cdu_diu_wadr[6:5] = 00 // corresponds to linenumber, onlyfirst address is // issued for each DRAM access. Thus line is always 0.// The DIU generates these bits of the address. cdu_diu_wadr[4:3] =color if (half = = 1) then cdu_diu_wadr[21:7] = upr_halfblock_adr // forlines 4-7 of JPEG block else cdu_diu_wadr[21:7] = lwr_halfblock_adr //for lines 0-3 of JPEG block // update half, color, block and addressesafter each DRAM write access if (rd_adv_half_block = = 1) then if (half= = 1) then half = 0 if (color = = max_plane) then color = 0 if (block= = max_block) then // end of writing a line of JPEG blocks pulsewradv8line block = 0 // update half block address for start of next lineof JPEG blocks taking // account of address wrapping in circular bufferand 4 line offset if (upr_halfblock_adr = = buff_end_adr) thenupr_halfblock_adr = buff_start_adr + max_block + 1  elsif(upr_halfblock_adr +  max_block + 1 = = buff end adr) thenupr_halfblock_adr = buff_start_adr else upr_halfblock_adr =upr_halfblock_adr + max block + 2 else block ++ upr_halfblock_adr ++ //move to address for lines 4-7 for next block else color ++ else half = 1if (color = = max_plane) then if (block = = max_block) then // end ofwriting a line of JPEG blocks // update half block address for start ofnext line of JPEG blocks taking // account of address wrapping incircular buffer and 4 line offset if (lwr_halfblock_adr = =buff_end_adr) then lwr_halfblock_adr = buff_start_adr + max block + 1elsif (lwr_halfblock_adr + max_block + 1 = = buff_end_adr) thenlwr_halfblock_adr = buff_start_adr else lwr_halfblock_adr =lwr_halfblock_adr + max_block + 2 else lwr_halfblock_adr ++ // move toaddress for lines 0-3 for next block

[2910] 22.5.9 Contone Line Store Interface

[2911] The contone line store interface is responsible for providing thecontrol over the shared resource in DRAM. The CDU writes 8 lines of datain up to 4 color planes, and the CFU reads them line-at-a-time. Thecontone line store interface provides the mechanism for keeping track ofthe number of lines stored in DRAM, and provides signals so that a givenline cannot be read from until the complete line has been written.

[2912] The CDU writes 8 lines of data in parallel but writes the first 4lines and second 4 lines to separate areas in DRAM. Thus, when the CFUhas read 4 lines from DRAM that area now becomes free for the CDU towrite to. Thus the size of the line store in DRAM should be a multipleof 4 lines.

[2913] The minimum size of the line store interface is 8 lines,providing a single buffer scheme. Typical sizes are 12 lines for a 1.5buffer scheme while 16 lines provides a double-buffer scheme.

[2914] The size of the contone line store is defined by num_buff_lines.A count is kept of the number of lines stored in DRAM that are availableto be written to. When Go transitions from 0 to 1, NumLinesAvail is setto the value of num_buff_lines. The CDU may only begin to write to DRAMas long as there is space available for 8 lines, indicated when theline_store_ok_to_write bit is set.

[2915] When the CDU has finished writing 8 lines, the write control unitsends an wradv8line pulse to the contone line store interface, andNumLinesAvail is decremented by 8. The write control unit then waits forline_store_ok_to_write to be set again.

[2916] If the contone line store is not empty (has one or more linesavailable in it), the CDU will indicate to the CFU via thecdu_cfu_linestore_rdy signal. The cdu_cfu_linestore_rdy signal isgenerated by comparing the NumLinesAvail with the programmednum_buff_lines. As the CFU reads a line from the contone line store itwill pulse the rdadvline to indicate that it has read a full line fromthe line store. NumLinesAvail is incremented by 1 on receiving ardadvline pulse.

[2917] To enable running the CDU while the CFU is not running theNumLinesAvail register can also be updated via the configurationregister interface. In this scenario the CPU polls the value of theNumLinesAvail register and overwrites it to prevent stalling of the CDU(NumLinesAvail<8). The CPU will always have priority in any updating ofthe NumLinesAvail register.

[2918] 23 Contone FIFO Unit (CFU)

[2919] 23.1 Overview

[2920] The Contone FIFO Unit (CFU) is responsible for reading thedecompressed contone data layer from the circular buffer in DRAM,performing optional color conversion from YCrCb to RGB followed byoptional color inversion in up to 4 color planes, and then feeding thedata on to the HCU. Scaling of data is performed in the horizontal andvertical directions by the CFU so that the output to the HCU matches theprinter resolution. Non-integer scaling is supported in both thehorizontal and vertical directions. Typically, the scale factor will bethe same in both directions but may be programmed to be different.

[2921] 23.2 Bandwidth Requirements

[2922] The CFU must read the contone data from DRAM fast enough to matchthe rate at which the contone data is consumed by the HCU.

[2923] Pixels of contone data are replicated a X scale factor (SF)number of times in the X direction and Y scale factor (SF) number oftimes in the Y direction to convert the final output to 1600 dpi.Replication in the X direction is performed at the output of the CFU ona pixel-by-pixel basis while replication in the Y direction is performedby the CFU reading each line a number of times, according to the Y-scalefactor, from DRAM. The HCU generates 1 dot (bi-level in 6 colors) persystem clock cycle to achieve a print speed of 1 side per 2 seconds forfull bleed A4/Letter printing. The CFU output buffer needs to besupplied with a 4 color contone pixel (32 bits) every SF cycles. Withsupport for 4 colors at 267 ppi the CFU must read data from DRAM at 5.33bits/cycle¹⁴.

[2924] 23.3 Color Space Conversion

[2925] The CFU allows the contone data to be passed directly on, whichwill be the case if the color represented by each color plane in theJPEG image is an available ink. For example, the four colors may be C,M, Y, and K, directly represented by CMYK inks. The four colors mayrepresent gold, metallic green etc. for multi-SoPEC printing with exactcolors.

[2926] JPEG produces better compression ratIOs for a given visiblequality when luminance and chrominance channels are separated. WithCMYK, K can be considered to be luminance, but C, M and Y each containluminance information and so would need to be compressed withappropriate luminance tables. We therefore provide the means by whichCMY can be passed to SoPEC as YCrCb. K does not need color conversion.

[2927] When being JPEG compressed, CMY is typically converted to RGB,then to YCrCb and then finally JPEG compressed. At decompression, theYCrCb data is obtained, then color converted to RGB, and finally back toCMY.

[2928] The external RIP provides conversion from RGB to YCrCb,specifically to match the actual hardware implementation of the inversetransform within SoPEC, as per CCIR 601-2 [24] except that Y, Cr and Cbare normalized to occupy all 256 levels of an 8-bit binary encoding.

[2929] The CFU provides the translation to either RGB or CMY. RGB isincluded since it is a necessary step to produce CMY, and some printersincrease their color gamut by including RGB inks as well as CMYK.

[2930] Consequently the JPEG stream in the color space convertor is oneof:

[2931] 1 color plane, no color space conversion

[2932] 2 color planes, no color space conversion

[2933] 3 color planes, no color space conversion

[2934] 3 color planes YCrCb, conversion to RGB

[2935] 4 color planes, no color space conversion

[2936] 4 color planes YCrCbX, conversion of YCrCb to RGB, no colorconversion of X

[2937] The YCrCb to RGB conversion is described in [14]. Note that ifthe data is non-compressed, there is no specific advantage in performingcolor conversion (although the CDU and CFU do permit it).

[2938] 23.4 Color Space Inversion

[2939] In addition to performing optional color conversion the CFU alsoprovides for optional bit-wise inversion in up to 4 color planes. Thisprovides the means by which the conversion to CMY may be finalised, orto may be used to provide planar correlation of the dither matrices.

[2940] The RGB to CMY conversion is given by the relationship:

[2941] C=255−R

[2942] M=255−G

[2943] Y=255−B

[2944] These relationships require the page RIP to calculate the RGBfrom CMY as follows:

[2945] R=255−C

[2946] G=255−M

[2947] B=255−Y

[2948] 23.5 Scaling

[2949] Scaling of pixel data is performed in the horizontal and verticaldirections by the CFU so that the output to the HCU matches the printerresolution. The CFU supports non-integer scaling with the scale factorrepresented by a numerator and a denominator. Only scaling up of thepixel data is allowed, i.e. the numerator should be greater than orequal to the denominator. For example, to scale up by a factor of twoand a half, the numerator is programmed as 5 and the denominatorprogrammed as 2.

[2950] Scaling is implemented using a counter as described in thepseudocode below. An advance pulse is generated to move to the next dot(x-scaling) or line (y-scaling). if (count + denominator − numerator >=0) then count = count + denominator − numerator advance = 1 else count =count + denominator advance = 0

[2951] 23.6 Lead-In and Lead-Out Clipping

[2952] The JPEG algorithm encodes data on a block by block basis, eachblock consists of 64 8-bit pixels (representing 8 rows each of 8pixels). If the image is not a multiple of 8 pixels in X and Y thenpadding must be present. This padding (extra pixels) will be presentafter decoding of the JPEG bytestream.

[2953] Extra padded lines in the Y direction (which may get scaled up inthe CFU) will be ignored in the HCU through the setting of theBottomMargin register.

[2954] Extra padded pixels in the X direction must also be removed sothat the contone layer is clipped to the target page as necessary.

[2955] In the case of a multi-SoPEC system, 2 SoPECs may be responsiblefor printing the same side of a page, e.g. SoPEC #1 controls printing ofthe left side of the page and SoPEC #2 controls printing of the rightside of the page and shown in FIG. 141. The division of the contonelayer between the 2 SoPECs may not fall on a 8 pixel (JPEG block)boundary. The JPEG block on the boundary of the 2 SoPECs (JPEG block nbelow) will be the last JPEG block in the line printed by SoPEC #1 andthe first JPEG block in the line printed by SoPEC #2. Pixels in thisJPEG block not destined for SoPEC #1 are ignored by appropriatelysetting the LeadOutClipNum. Pixels in this JPEG block not destined forSoPEC #2 must be ignored at the beginning of each line. The number ofpixels to be ignored at the start of each line is specified by theLeadInClipNum register. It may also be the case that the CDU writes outmore JPEG blocks than is required to be read by the CFU, as shown forSBPEC #2 below. In this case_the value of the MaxBlock register in theCDU is set to correspond to JPEG block m but the value for the MaxBlockregister in the CFU is set to correspond to JPEG block m⁻1. Thus JPEGblock m is not read in by the CFU. Additional clipping on contone pixelsis required when they are scaled up to the printer's resolution. Thescaling of the first valid pixel in the line is controlled by settingthe XstartCount register. The HcuLineLength register defines the size ofthe target page for the contone layer at the printer's resolution andcontrols the scaling of the last valid pixel in a line sent to the HCU.

[2956] 23.7 Implementation

[2957]FIG. 142 shows a block diagram of the CFU.

[2958] 23.7.1 Definitions of I/O TABLE 150 CFU port list and descriptionPort Name Pins I/O Description Clocks and reset pclk 1 In System clockprst_n 1 In System reset, synchronous active low. PCU interfacepcu_cfu_sel 1 In Block select from the PCU. When pcu_cfu_sel is highboth pcu_adr and pcu_dataout are valid. pcu_rwn 1 In Commonread/not-write signal from the PCU. pcu_adr[6:2] 4 In PCU address bus.Only 5 bits are required to decode the address space for this block.pcu_dataout[31:0] 32 In Shared write data bus from the PCU. cfu_pcu_rdy1 Out Ready signal to the PCU. When cfu_pcu_rdy is high it indicates thelast cycle of the access. For a write cycle this means pcu_dataout hasbeen registered by the block and for a read cycle this means the data oncfu_pcu_datain is valid. cfu_pcu_datain[31:0] 32 Out Read data bus tothe PCU. DIU interface cfu_diu_rreq 1 Out CFU read request, active high.A read request must be accompanied by a valid read address. diu_cfu_rack1 In Acknowledge from DIU, active high. Indicates that a read requesthas been accepted and the new read address can be placed on the addressbus, cfu_diu_radr. cfu_diu_radr[21:5] 17 Out CFU read address. 17 bitswide (256-bit aligned word). diu_cfu_rvalid 1 In Read data valid, activehigh. Indicates that valid read data is now on the read data bus,diu_data. diu_data[63:0] 64 In Read data from DRAM. CDU interfacecdu_cfu_linestore_rdy 1 In When high indicates that the contone linestore has 1 or more lines available to be read by the CFU.cfu_cdu_rdadvline 1 Out Read line pulse, active high. Indicates that theCFU has finished reading a line of decompressed contone data to thecircular buffer in DRAM and that line of the buffer is now free. HCUinterface hcu_cfu_advdot 1 In Informs the CFU that the HCU has capturedthe pixel data on cfu_hcu_c[0-3]data lines and the CFU can now place thenext pixel on the data lines. cfu_hcu_avail 1 Out Indicates valid datapresent on cfu_hcu_c[0-3]data lines. cfu_hcu_c0data[7:0] 8 Out Pixel ofdata in contone plane 0. cfu_hcu_c1data[7:0] 8 Out Pixel of data incontone plane 1. cfu_hcu_c2data[7:0] 8 Out Pixel of data in contoneplane 2. cfu_hcu_c3data[7:0] 8 Out Pixel of data in contone plane 3.

[2959] 23.7.2 Configuration Registers

[2960] The configuration registers in the CFU are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for the description ofthe protocol and timing diagrams for reading and writing registers inthe CFU. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for theCFU. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of cfu_pcu_datain. Theconfiguration registers of the CFU are listed in Table 151: TABLE 151CFU registers Address Register Value on (CFU_base +) Name #bits ResetDescription Control registers 0x00 Reset 1 0x1 A write to this registercauses a reset of the CFU. 0x04 Go 1 0x0 Writing 1 to this registerstarts the CFU. Writing 0 to this register halts the CFU. When Go isdeasserted the state-machines go to their idle states but all countersand configuration registers keep their values. When Go is asserted allcounters are reset, but configuration registers keep their values (i.e.they don't get reset). The CFU must be started before the CDU isstarted. This register can be read to determine if the CFU is running(1 - running, 0 - stopped). Setup registers 0x10 MaxBlock 13 0x000Number of JPEG MCUs (or JPEG block equiva- lents, i.e. 8 × 8 bytes) in aline - 1. 0x14 BuffStartAdr[21:7] 15 0x0000 Points to the start of thedecompressed contone circular buffer in DRAM, aligned to a half JPEGblock boundary. A half JPEG block consists of 4 words of 256-bits,enough to hold 32 contone pixels in 4 colors, i.e. half a JPEG block0x18 BuffEndAdr[21:7] 15 0x0000 Points to the end of the decompressedcontone circular buffer in DRAM, aligned to a half JPEG block boundary(address is inclusive). A half JPEG block con- sists of 4 words of256-bits, enough to hold 32 contone pixels in 4 colors, i.e. half a JPEGblock. 0x1C 4LineOffset 13 0x0000 Defines the offset between the startof one 4 line store to the start of the next 4 line store - 1. In Figuren page394 on page Error! Bookmark not defined., if BufStartAdrcorresponds to line 0 block 0 then BuffStartAdr + 4LineOffsetcorresponds to line 4 block 0. 4LineOffset is specified in units of 128bytes, eg 0-128 bytes, 1-256 bytes etc. This register is required inaddition to MaxBlock as the number of JPEG blocks in a line required bythe CFU may be dif- ferent from the number of JPEG blocks in a linewritten by the CDU. 0x20 YCrCb2RGB 1 0x0 Set this bit to enableconversion from YCrCb to RGB. Should not be changed between bands. 0x24InvertColorPlane 4 0x0 Set these bits to perform bit-wise inversion on aper color plane basis. bit0 - 1 invert color plane 0 - 0 do not convertbit1 - 1 invert color plane 1 - 0 do not convert bit2 - 1 invert colorplane 2 - 0 do not convert bit3 - 1 invert color plane 3 Should not bechanged between bands. 0x28 HcuLineLength 16 0x0000 Number of contonepixels - 1 in a line after scaling). Equals the number of hcu_cfu_dotadvpulses - 1 received from the HCU for each line of contone data. 0x2CLeadInClipNum 3 0x0 Number of contone pixels to be ignored at the startof a line (from JPEG block 0 in a line). They are not passed to theoutput buffer to be scaled in the X direction. 0x30 LeadOutClipNum 3 0x0Number of contone pixels to be ignored at the end of a line (from JPEGblock MaxBlock in a line). They are not passed to the out- put buffer tobe scaled in the X direction. 0x34 XstartCount 8 0x00 Value to be loadedat the start of every line into the counter used for scaling in the Xdirection. Used to control the scaling of the first pixel in a line tobe sent to the HCU. This value will typically be zero, except in thecase where a number of dots are clipped on the lead in to a line. 0x38XscaleNum 8 0x01 Numerator of contone scale factor in X direction. 0x3CXscaleDenom 8 0x01 Denominator of contone scale factor in X direction.0x40 YscaleNum 8 0x01 Numerator of contone scale factor in Y direction.0x44 YscaleDenom 8 0x01 Denominator of contone scale factor in Ydirection.

[2961] 23.7.3 Storage of Decompressed Contone Data in DRAM

[2962] The CFU reads decompressed contone data from DRAM in single256-bit accesses. JPEG blocks of decompressed contone data are stored inDRAM with the memory arrangement as shown The arrangement is in order tooptimize access for reads by writing the data so that 4 color componentsare stored together in each 256-bit DRAM word. The means that the CFUreads 64-bits in 4 colors from a single line in each 256-bit DRAMaccess.

[2963] The CFU reads data line at a time in 4 colors from DRAM. The readsequence, as shown in FIG. 143, is as follows:

[2964] line 0, block 0 in word p of DRAM

[2965] line 0, block 1 in word p+4 of DRAM

[2966] . . .

[2967] line 0, block n in word p+4n of DRAM

[2968] (repeat to read line a number of times according to scale factor)

[2969] line 1, block 0 in word p+1 of DRAM

[2970] line 1,block 1 in word p+5 of DRAM

[2971] etc. . . .

[2972] The CFU reads a complete line in up to 4 colors a Y scale factornumber of times from DRAM before it moves on to read the next. When theCFU has finished reading 4 lines of contone data that 4 line storebecomes available for the CDU to write to.

[2973] 23.7.4 Decompressed Contone Buffer

[2974] Since the CFU reads 256 bits (4 colors x 64 bits) from memory ata time, it requires storage of at least 2×256 bits at its input. Toallow for all possible DIU stall conditions the input buffer isincreased to 3×256 bits to meet the CFU target bandwidth requirements.The CFU receives the data from the DIU over 4 clock cycles (64-bits of asingle color per cycle). It is implemented as 4 buffers. Each bufferconceptually is a 64-bit input and 8-bit output buffer to account forthe 64-bit data transfers from the DIU, and the 8-bit output per colorplane to the color space converter.

[2975] On the DRAM side, wr_buff indicates the current buffer withineach triple-buffer that writes are to occur to. wr_sel selects whichtriple-buffer to write the 64 bits of data to when wr_en is asserted. Onthe color space converter side, rd_buff indicates the current bufferwithin each triple-buffer that reads are to occur from. When rd_en isasserted a byte is read from each of the triple-buffers in parallel.rd_sel is used to select a byte from the 64 bits (1st byte correspondsto bits 7-0, second byte to bits 15-8 etc.).

[2976] Due to the limitations of available register arrays in IBMtechnology, the decompressed contone buffer is implemented as aquadruple buffer. While this offers some benefits for the CFU it is notnecessitated by the bandwidth requirements of the CFU.

[2977] 23.7.5 Y-Scaling Control Unit

[2978] The Y-scaling control unit is responsible for reading thedecompressed contone data and passing it to the color space convertervia the decompressed contone buffer. The decompressed contone data isread from DRAM in single 256-bit accesses, receiving the data from theDIU over 4 clock cycles (64-bits per cycle). The protocol and timing forread accesses to DRAM is described in section 20.9.1 on page 240. Readaccesses to DRAM are implemented by means of the state machine describedin FIG. 144.

[2979] All counters and flags should be cleared after reset. When Gotransitions from 0 to 1 all counters and flags should take their initialvalue. While the Go bit is set, the state machine relies on theline8_ok_to_read and buff_ok_to_write flags to tell it whether toattempt to read a line of compressed contone data from DRAM. Whenline8_ok_to_read is 0 the state machine does nothing. Whenline8_ok_to_read is 1 the state machine continues to load data into thedecompressed contone buffer up to 256-bits at a time while there isspace available in the buffer. A bit is kept for the status of each64-bit buffer: buff_avail[0] and buff_avail[1]. It also keeps a singlebit (rd_buff) for the current buffer that reads are to occur from, and asingle bit (wr_buff) for the current buffer that writes are to occur to.

[2980] buff ok_to_write equals ˜buff_avail[wr_buff]. When a wr_adv_buffpulse is received, buff_avail[wr_buff] is set, and wr_buff is inverted.Whenever diu_cfu_rvalid is asserted, wr_en is asserted to write the64-bits of data from DRAM to the buffer selected by wr_sel and wr_buff.

[2981] buff ok_to_read equals buff_avail[rd_buff]. If there is dataavailable in the buffer and the output double-buffer has space available(outbuff_ok_to_write equals 1) then data is read from the buffer byasserting rd_en and rd_sel gets incremented to point to the next value.wr_adv is asserted in the following cycle to write the data to theoutput double-buffer of the CFU. When finished reading the buffer,rd_sel equals b111 and rd_en is asserted, buff_avail[rd_buff] is set,and rd_buff is inverted.

[2982] Each line is read a number of times from DRAM, according to theY-scale factor, before the CFU moves on to start reading the next lineof decompressed contone data. Scaling to the printhead resolution in theY direction is thus performed.

[2983] The pseudocode below shows how the read address from DRAM iscalculated on a per clock cycle basis. Note all counters and flagsshould be cleared after reset or when Go is cleared. When a 1 is writtento Go, both curr_halfblock and line_start_halfblock get loaded withbuff_start_adr, and y_scale_count gets loaded with y_scale_denom.Scaling in the Y direction is implemented by line replication byre-reading lines from DRAM. The algorithm for non-integer scaling isdescribed in the pseudocode below. // assign read address output to DRAMcdu_diu_wadr[21:7] = curr_halfblock cdu_diu_wadr[6:5] = line[1:0] //update block, line, y_scale_count and addresses after each DRAM readaccess if (wr_adv_buff = = 1) then if (block = = max_block) then // endof reading a line of contone in up to 4 colors block = 0 // checkwhether to advance to next line of contone data in DRAM if(y_scale_count + y_scale_denom − y_scale_num >= 0) then y_scale_count =y_scale_count + y_scale_denom − y_scale_num pulse RdAdvline if (line = =3) then // end of reading 4 line store of contone data line = 0 //update half block address for start of next line taking account of //address wrapping in circular buffer and 4 line offset if (curr_halfblock= = buff_end_adr) then curr_halfblock = buff_start_adr line_start_adr =buff_start_adr elsif ( (line_start_adr + 4line_offset) = = buff_end_adr)) then curr_halfblock = buff_start_adr line_start_adr = buff_start_adrelse curr_halfblock = line_start_adr + 4line_offset line_start_adr =line_start_adr + 4line_offset else line ++ curr_halfblock =line_start_adr else // re-read current line from DRAM y_scale_count =y_scale_count + y_scale_denom curr_halfblock = line_start_adr else block++ curr_halfblock ++

[2984] 23.7.6 Contone Line Store Interface

[2985] The contone line store interface is responsible for providing thecontrol over the shared resource in DRAM. The CDU writes 8 lines of datain up to 4 color planes, and the CFU reads them line-at-a-time. Thecontone line store interface provides the mechanism for keeping track ofthe number of lines stored in DRAM, and provides signals so that a givenline cannot be read from until the complete line has been written.

[2986] A count is kept of the number of lines that have been written toDRAM by the CDU and are available to be read by the CFU. At start-up,buff_lines_avail is set to the 0. The CFU may only begin to read fromDRAM when the CDU has written 8 complete lines of contone data. When theCDU has finished writing 8 lines, it sends an cdu_cfu_wradv8line pulseto the CFU, and buff_lines_avail is incremented by 8. The CFU maycontinue reading from DRAM as long as buff_lines_avail is greater than0. line8_ok_to_read is set while buff_lines_avail is greater than 0.When it has completely finished reading a line of contone data fromDRAM, the Y-scaling control unit sends a RdAdvLine signal to contoneline store interface and to the CDU to free up the line in the buffer inDRAM. buff_lines_avail is decremented by 1 on receiving a RdAdvlinepulse.

[2987] 23.7.7 Color Space Converter (CSC)

[2988] The color space converter consists of 2 stages: optional colorconversion from YCrCb to RGB followed by optional bit-wise inversion inup to 4 color planes.

[2989] The convert YCrCb to RGB block takes 3 8-bit inputs defined as Y,Cr, and Cb and outputs either the same data YCrCb or RGB. The YCrCb2RGBparameter is set to enable the conversion step from YCrCb to RGB. IfYCrCb2RGB equals 0, the conversion does not take place, and the inputpixels are passed to the second stage. The 4th color plane, if present,bypasses the convert YCrCb to RGB block. Note that the latency of theconvert YCrCb to RGB block is 1 cycle. This latency should be equalizedfor the 4th color plane as it bypasses the block.

[2990] The second stage involves optional bit-wise inversion on a percolor plane basis under the control of invert_color_plane. For exampleif the input is YCrCbK, then YCrCb2RGB can be set to 1 to convert YCrCbto RGB, and invert_color_plane can be set to 0111 to then convert theRGB to CMY, leaving K unchanged.

[2991] If YCrCb2RGB equals 0 and invert_color_plane equals 0000, nocolor conversion or color inversion will take place, so the outputpixels will be the same as the input pixels.

[2992]FIG. 145 shows a block diagram of the color space converter.

[2993] The convert YCrCb to RGB block is an implementation of [14].Although only 10 bits of coefficients are used (1 sign bit, 1 integerbit, 8 fractional bits), full internal accuracy is maintained with 18bits. The conversion is implemented as follows:

[2994] R*=Y+(359/256)(Cr−128)

[2995] G*=Y−(183/256)(Cr−128)−(88/256)(Cb−128)

[2996] B*=Y+(454/256)(Cb−128)

[2997] R*, G* and B* are rounded to the nearest integer and saturated tothe range 0-255 to give R, G and B. Note that, while a Reset results inall-zero output, a zero input gives output RGB=[0¹⁵, 136¹⁶, 0¹⁷].

[2998] 23.7.8 X-Scaling Control Unit

[2999] The CFU has a 2×32-bit double-buffer at its output between thecolor space converter and the HCU. The X-scaling control unit performsthe scaling of the contone data to the printers output resolution,provides the mechanism for keeping track of the current read and writebuffers, and ensures that a buffer cannot be read from until it has beenwritten to.

[3000] A bit is kept for the status of each 32-bit buffer: buff_avail[0]and buff_avail[1]. It also keeps a single bit (rd_buff) for the currentbuffer that reads are to occur from, and a single bit (wr_buff) for thecurrent buffer that writes are to occur to.

[3001] The output value outbuff_ok_to_write equals ˜buff_avail[wr_buff].Contone pixels are counted as they are received from the Y-scalingcontrol unit, i.e. when wr_adv is 1. Pixels in the lead-in and lead-outareas are ignored, i.e. they are not written to the output buffer.Lead-in and lead-out clipping of pixels is implemented by the followingpseudocode that generates the wr_en pulse for the output buffer. if(wradv = = 1) then if (pixel_count = = {max_block,b111}) thenpixel_count = 0 else pixel_count ++ if ((pixel_count < leadin_clip_num)OR (pixel_count > ({max_block,b111} − leadout_clip_num))) then wr_en = 0else wr_en = 1

[3002] When a wr_en pulse is sent to the output double-buffer,buff_avail[wr_buff] is set, and wr_buff is inverted.

[3003] The output cfu_hcu_avail equals buff_avail[rd_buff]. Whencfu_hcu_avail equals 1, this indicates to the HCU that data is availableto be read from the CFU. The HCU responds by asserting hcu_cfu_advdot toindicate that the HCU has captured the pixel data on cfu_hcu_c[0-3]datalines and the CFU can now place the next pixel on the data lines.

[3004] The input pixels from the CSC may be scaled a non-integer numberof times in the X direction to produce the output pixels for the HCU atthe printhead resolution. Scaling is implemented by pixel replication.The algorithm for non-integer scaling is described in the pseudocodebelow. Note, x_scale_count should be loaded with x_start_count afterreset and at the end of each line. This controls the amount by which thefirst pixel is scaled by. hcu_line_length and hcu_cfu_dotadv control theamount by which the last pixel in a line that is sent to the HCU isscaled by. if (hcu_cfu_dotadv = = 1) then if (x_scale_count +x_scale_denom − x_scale_num >= 0) then x_scale_count = x_scale_count +x_scale_denom − x_scale_num rd_en = 1 else x_scale_count =x_scale_count + x_scale_denom rd_en = 0 else x_scale_count =x_scale_count rd_en = 0

[3005] When a rd_en pulse is received, buff_avail[rd_buff] is cleared,and rd_buff is inverted.

[3006] A 16-bit counter, dot_adv_count, is used to keep a count of thenumber of hcu_cfu_dotadv pulses received from the HCU. If the value ofdot_adv_count equals hcu_line_length and a hcu_cfu_dotadv pulse isreceived, then a rd_en pulse is genrated to present the next dot at theoutput of the CFU, dot_adv_count is reset to 0 and x_scale_count isloaded with x_start_count.

[3007] 24 Lossless Bi-Level Decoder (LBD)

[3008] 24.1 Overview

[3009] The Lossless Bi-level Decoder (LBD) is responsible fordecompressing a single plane of bi-level data. In SoPEC bi-level data islimited to a single spot color (typically black for text and linegraphics).

[3010] The input to the LBD is a single plane of bi-level data, read asa bitstream from DRAM. The LBD is programmed with the start address ofthe compressed data, the length of the output (decompressed) line, andthe number of lines to decompress. Although the requirement for SoPEC isto be able to print text at 10:1 compression, the LBD can cope with anycompression ratio if the requested DRAM access is available. Apass-through mode is provided for 1:1 compression. Ten-point plain textcompresses with a ratio of about 50:1. Lossless bi-level compressionacross an average page is about 20:1 with 10:1 possible for pages whichcompress poorly.

[3011] The output of the LBD is a single plane of decompressed bi-leveldata. The decompressed bi-level data is output to the SFU (Spot FIFOUnit), and in turn becomes an input to the HCU (Halftoner/Compositorunit) for the next stage in the printing pipeline. The LBD also outputsa lbd_finishedband control flag that is used by the PCU and is availableas an interrupt to the CPU.

[3012] 24.2 Main Features OF LBD

[3013]FIG. 147 shows a schematic outline of the LBD and SFU.

[3014] The LBD is required to support compressed images of up to 800dpi. If possible we would like to support bi-level images of up to 1600dpi. The line buffers must therefore be long enough to store a completeline at 1600 dpi.

[3015] The PEC1 LBD is required to output 2 dots/cycle to the HCU. Thisthroughput capability is retained for SoPEC to minimise changes to theblock, although in SoPEC the HCU will only read 1 dot/cycle. The PEC1LDB outputs 16 bits in parallel to the PEC1 spot buffer. This is alsoretained for SoPEC. Therefore the LBD in SoPEC can run much faster thanis required. This is useful for allowing stalls, e.g. due to bandprocessing latency, to be absorbed.

[3016] The LBD has a pass through mode to cope with local negativecompression. Pass through mode is activated by a special run-lengthcode. Pass through mode continues to either end of line or for apre-programmed number of bits, whichever is shorter. The specialrun-length code is always executed as a run-length code, followed bypass through.

[3017] The LBD outputs decompressed bi-level data to the NextLineFiFO inthe Spot FIFO Unit (SFU). This stores the decompressed lines in DRAM,with a typical minimum of 2 lines stored in DRAM, nominally 3 lines upto a programmable number of lines. The SFU's NextLineFIFO can fill whilethe SFU waits for write access to DRAM. Therefore the LBD must be ableto support stalling at its coding process. This is provided by the SFUvia it's output during a line.

[3018] The LBD uses the previous line in the decoding process. This isprovided by the SFU via it's PrevLineFIFO. Decoding can stall in the LBDwhile this FIFO waits to be filled from DRAM. A signal sfu_lbd_rdyindicates that both the SFU's NextLineFIFO and PrevLineFIFO areavailable for writing and reading, respectively.

[3019] A configuration register in the LBD controls whether the firstline being decoded at the start of a band uses the previous line readfrom the SFU or uses an all 0's line instead.

[3020] The line length is stored in DRAM must be programmable to a valuegreater than 128. An A4 line of 13824 dots requires 1.7 Kbytes ofstorage. An A3 line of 19488 dots requires 2.4 Kbytes of storage.

[3021] The compressed spot data can be read at a rate of 1 bit/cycle forpass through mode 1:1 compression.

[3022] The LBD finished band signal is exported to the PCU and isadditionally available to the CPU as an interrupt.

[3023] 24.2.1 Bi-Level Decoding in the LBD

[3024] The black bi-level layer is losslessly compressed usingSilverbrook Modified Group 4 (SMG4) compression which is a version ofGroup 4 Facsimile compression [22] without Huffman and with simplifiedrun length encodings. The encoding are listed in Table 152 and Table153. TABLE 152 Bi-Level group 4 facsimile style compression encodingsEncoding Description same as Group 1000 Pass Command: a0

b2, 4 Facsimile skip next two edges 1 Vertical(0): a0

b1, color = !color 110 Vertical(1): a0

b1 + 1, color = !color 010 Vertical(−1): a0

b1 − 1, color = !color 110000 Vertical(2): a0

b1 + 2, color = !color 010000 Vertical(−2): a0

b1 − 2, color = !color Unique to this 100000 Vertical(3): a0

b1 + implementation 3, color = !color 000000 Vertical(−3): a0

b1 − 3, color = !color <RL><RL>100 Horizontal: a0

a0 + <RL> + <RL>

[3025] SMG4 has a pass through mode to cope with local negativecompression. Pass through mode is activated by a special run-lengthcode. Pass through mode continues to either end of line or for apre-programmed number of bits, whichever is shorter. The specialrun-length code is always executed as a run-length code, followed bypass through. The pass through escape code is a medium length run-lengthwith a run of less than or equal to 31. TABLE 153 Run length (RL)encodings Encoding Description Unique RRRRR1 Short Black Runlength tothis (5 bits) implemen- tation RRRRR1 Short White Runlength (5 bits)RRRRRRRRRR10 Medium Black Runlength (10 bits) RRRRRRRR10 Medium WhiteRunlength (8 bits) RRRRRRRRRR10 Medium Black Runlength with RRRRRRRRRR<= 31, Enter pass through RRRRRRRR10 Medium White Runlength withRRRRRRRR <= 31, Enter pass through RRRRRRRRRRRRRRR00 Long BlackRunlength (15 bits) RRRRRRRRRRRRRRR00 Long White Runlength (15 bits)

[3026] Since the compression is a bitstream, the encodings are readright (least significant bit) to left (most significant bit). The runlengths given as RRRRR in Table 153 are read in the same way (leastsignificant bit at the right to most significant bit at the left).

[3027] There is an additional enhancement to the G4 fax algorithm, itrelates to pass through mode. It is possible for data to compressnegatively using the G4 fax algorithm. On occasions like this it wouldbe easier to pass the data to the LBD as un-compressed data. Passthrough mode is a new feature that was not implemented in the PEC1version of the LBD. When the LBD is in pass through mode the leastsignificant bit of the data stream is an un-compressed bit. This bit isused to construct the current line.

[3028] To enter pass through mode the LBD takes advantage of the way runlengths can be written.

[3029] Usually if one of the runlength pair is less than or equal to 31it should be encoded as a short runlength. However under the codingscheme of Table it is still legal to write it as a medium or longrunlength. The LBD has been designed so that if a short runlength valueis detected in a medium runlength then once the horizontal commandcontaining this runlength is decoded completely this will tell the LBDto enter pass through mode and the bits following the runlength isun-compressed data. The number of bits to pass through is either aprogrammed number of bits or the end of the line which ever comes first.Once the pass through mode is completed the current color is the same asthe color of the last bit of the passed through data.

[3030] 24.2.2 DRAM Access Requirements

[3031] The compressed page store for contone, bi-level and raw tag datais 2 Mbytes. The LBD will access the compressed page store in single256-bit DRAM reads. The LBD will need a 256-bit double buffer in itsinterface to the DIU. The LBD's DIU bandwidth requirements aresummarized in Table 154 TABLE 154 DRAM bandwidth requirements Maximumnumber of cycles between Peak Average each 256-bit Bandwidth BandwidthDirection DRAM access (bits/cycle) (bits/cycle) Read 2561 (1:1 1 (1:10.1 (10:1 compression) compression) compression)

[3032] 1: At 1:1 compression the LBD requires 1 bit/cycle or 256 bitsevery 256 cycles.

[3033] 24.3 Implementation

[3034] 24.3.1 Definitions of IO TABLE 155 LBD Port List Port Name PinsI/O Description Clocks and Resets Pclk 1 In SoPEC Functional clock.prst_n 1 In Global reset signal. Bandstore signalscdu_endofbandstore[21:5] 17 In Address of the end of the current band ofdata. 256-bit word aligned DRAM address. cdu_startofbandstore[21:5] 17In Address of the start of the current band of data. 256-bit wordaligned DRAM address. lbd_finishedband 1 Out LBD finished band signal toPCU and Interrupt Controller. DIU Interface signals lbd_diu_rreq 1 OutLBD requests DRAM read. A read request must be accom- panied by a validread address. lbd_diu_radr[21:5] 17 Out Read address to DIU 17 bits wide(256-bit aligned word). diu_lbd_rack 1 In Acknowledge from DIU that readrequest has been accepted and new read address can be placed onlbd_diu_radr. diu_data[63:0] 64 In Data from DIU to SoPEC Units. First64-bits is bits 63:0 of 256 bit word. Second 64-bits is bits 127:64 of256 bit word. Third 64-bits is bits 191:128 of 256 bit word. Fourth64-bits is bits 255:192 of 256 bit word. diu_lbd_rvalid 1 In Signal fromDIU telling SoPEC Unit that valid read data is on the diu_data bus PCUInterface data and control signals pcu_addr[5:2] 4 In PCU address bus.Only 4 bits are required to decode the address space for this block.pcu_dataout[31:0] 32 In Shared write data bus from the PCU.lbd_pcu_datain[31:0] 32 Out Read data bus from the LBD to the PCU.pcu_rwn 1 In Common read/not-write signal from the PCU. pcu_lbd_sel 1 InBlock select from the PCU. When pcu_lbd_sel is high both pcu_addr andpcu_dataout are valid. lbd_pcu_rdy 1 Out Ready signal to the PCU. Whenlbd_pcu_rdy is high it indicates the last cycle of the access. For awrite cycle this means pcu_dataout has been registered by the block andfor a read cycle this means the data on lbd_pcu_datain is valid. SFUInterface data and control signals sfu_lbd_rdy 1 In Ready signalindicating SFU has previous line data available for reading and is alsoready to be written to. lbd_sfu_advline 1 Out Advance line signal toprevious and next line buffers lbd_sfu_pladvword 1 Out Advance wordsignal for previous line buffer. sfu_lbd_pldata[15:0] 16 In Data fromthe previous line buffer. lbd_sfu_wdata[15:0] 16 Out Write data for nextline buffer. lbd_sfu_wdatavalid 1 Out Write data valid signal for nextline buffer data.

[3035] 24.3.2 Configuration Registers TABLE 156 LBD ConfigurationRegisters Value Address # on (LBD_base +) Register Name Bits ResetDescription Control registers 0x00 Reset 1 0x1 A write to this registercauses a reset of the LBD. This register can be read to indicate thereset state: 0 - reset in progress 1 - reset not in progress 0x04 Go 10x0 Writing 1 to this register starts the LBD. Writing 0 to thisregister halts the LBD. The Go register is reset to 0 by the LBD when itfinishes processing a band. When Go is deasserted the state-machines goto their idle states but all counters and configuration registers keeptheir values. When Go is asserted all counters are reset, butconfiguration registers keep their values (i.e. they don't get reset).The LBD should only be started after the SFU is started. This registercan be read to determine if the LBD is running (1 - running, 0 -stopped). Setup registers (constant for during processing the page) 0x08LineLength 16 0x0000 Width of expanded bi-level line (in dots) (must beset greater than 128 bits). 0x0C PassThroughEnable 1 0x1 Writing 1 tothis register enables passthrough mode. Writing 0 to this registerdisables pass- through mode thereby making the LBD compatible with PEC1.0x10 PassThroughDotLength 16 0x0000 This is the dot length - 1 for whichpass- through mode will last. If the end of the line is reached firstthen pass-through will be disabled. The value written to this registermust be a non-zero value. Work registers (need to be set up beforeprocessing a band) 0x14 NextBandCurrReadAdr[21:5] 17 0x00000 Shadowregister which is copied to (256-bit aligned DRAM CurrReadAdr when(NextBandEnable = = 1 & address) Go = = 0). NextBandCurrReadAdr is theaddress of the start of the next band of compressed bi-level data inDRAM. 0x18 NextBandLinesRemaining 15 0x0000 Shadow register which iscopied to LinesRemaining when (NextBandEnable = = 1 & Go = = 0).NextBandLinesRemaining is the number of lines to be decoded in the nextband of compressed bi-level data. 0x1C NextBandPrevLineSource 1 0x0Shadow register which is copied to PrevLineSource when (NextBandEnable= = 1 & Go = = 0). 1 - use the previous line read from the SFU fordecoding the first line at the start of the next band. 0 - ignore theprevious line read from the SFU for decoding the first line at the startof the next band (an all 0's line is used instead). 0x20 NextBandEnable1 0x0 If (NextBandEnable = = 1 & Go = = 0) then NextBandCurrReadAdr iscopied to CurrReadAdr, NextBandLinesRemaining is copied toLinesRemaining, NextBandPrevLineSource is copied to PrevLineSource, Gois set, NextBandEnable is cleared. To start LBD processingNextBandEnable should be set. Work registers (read only for externalaccess) 0x24 CurrReadAdr[21:5] 17 — The current 256-bit aligned readaddress (256-bit aligned DRAM within the compressed bi-level image (DRAMaddress) address). Read only register. 0x28 LinesRemaining 15 — Count ofnumber of lines remaining to be decoded. The band has finished when thisnumber reaches 0. Read only register. 0x2C PrevLineSource 1 — 1 - usesthe previous line read from the SFU for decoding the first line at thestart of the next band. 0 - ignores the previous line read from the SFUfor decoding the first line at the start of the next band (an all 0'sline is used instead). Read only register. 0x30 CurrWriteAdr 15 — Thecurrent dot position for writing to the SFU. Read only register. 0x34FirstLineOfBand 1 — Indicates whether the current line is con- sideredto be the first line of the band. Read only register.

[3036] 24.3.3 Starting the LBD Between Bands

[3037] The LBD should be started after the SFU. The LBD is programedwith a start address for the compressed bi-level data, a decode linelength, the source of the previous line and a count of how many lines todecode. The LBD's NextBandEnable bit should then be set (this will setLBD Go). The LBD decodes a single band and then stops, clearing it's Gobit and issuing a pulse on lbd_finishedband. The LBD can then berestarted for the next band, while the HCU continues to processpreviously decoded bi-level data from the SFU.

[3038] There are 4 mechanisms for restarting the LBD between bands:

[3039] a. lbd_finishedband causes an interrupt to the CPU. The LBD willhave stopped and cleared its Go bit. The CPU reprograms the LBD,typically the NextBandCurrReadAdr, NextBandLinesRemaining andNextBandPrevLineSource shadow registers, and sets NextBandEnable torestart the LBD.

[3040] b. The CPU programs the LBD's NextBandCurrReadAdr,NextBandLinesRemaining, and NextBandPrevLineSource shadow registers andsets the NextBandEnable flag before the end of the current band. At theend of the band the LBD clears Go, NextBandEnable is already set so theLBD restarts immediately.

[3041] c. The PCU is programmed so that lbd_finishedband triggers thePCU to execute commands from DRAM to reprogram the LBD'sNextBandCurrReadAdr, NextBandLinesRemaining, and NextBandPrevLineSourceshadow registers and set NextBandEnable to restart the LBD. Theadvantage of this scheme is that the CPU could process band headers inadvance and store the band commands in DRAM ready for execution.

[3042] d. This is a combination of b and c above. The PCU (rather thanthe CPU in b) programs the LBD's NextBandCurrReadAdr,NextBandLinesRemaining, and NextBandPrevLineSource shadow registers andsets the NextBandEnable flag before the end of the current band. At theend of the band the LBD clears Go and pulses lbd_finishedband.NextBandEnable is already set so the LBD restarts immediately.Simultaneously, lbd_finishedband triggers the PCU to fetch commands fromDRAM. The LBD will have restarted by the time the PCU has fetchedcommands from DRAM. The PCU commands program the LBD's shadow registersand sets NextBandEnable for the next band.

[3043] 24.3.4 Top-Level Description

[3044] A block diagram of the LBD is shown in FIG. 148.

[3045] The LBD contains the following sub-blocks: TABLE 157 Functionalsub-blocks in the LBD name Description Registers PCU interface andconfiguration registers. and Also generates the Go and the Reset signalsResets for the rest of the LBD Stream Accesses the bi-level descriptionfrom the Decoder DRAM through the DIU interface. It decodes the bitstream into a command with arguments, which it then passes to thecommand controller. Command Interprets the command from the streamdecoder Controller and provide the line fill unit with a limit addressand color to fill the SFU Next Line Buffer. It also provides the nextedge unit starting address to look for the next edge. Next Edge Scansthrough the Previous Line Buffer using Unit its current address to findthe next edge of a color provided by the command controller. The nextedge unit outputs this as the next current address back to the commandcon- troller and sets a valid bit when this address is at the next edge.Line Fill Fills the SFU Next Line Buffer with a Unit color from itscurrent address up to a limit address. The color and limit are providedby the command controller.

[3046] In the following description the LBD decodes data for its currentdecode line but writes this data into the SFU's next line buffer.

[3047] Naming of signals and logical blocks are taken from [22].

[3048] The LBD is able to stall mid-line should the SFU be unable tosupply a previous line or receive a current line frame due to bandprocessing latency.

[3049] All output control signals from the LBD must always be validafter reset. For example, if the LBD is not currently decoding,lbd_sfu_advline (to the SFU) and lbd_finishedband will always be 0.

[3050] 24.3.5 Registers and Resets Sub-Block Description

[3051] Since the CDU, LBD and TE all access the page band store, theyshare two registers that enable sequential memory accesses to the pageband stores to be circular in nature. The CDU chapter lists these tworegisters. The register descriptions for the LBD are listed in Table.

[3052] During initialisation of the LBD, the LineLength and theLinesRemaining configuration values are written to the LBD. The‘Registers and Resets’ sub-block supplies these signals to the othersub-blocks in the LBD. In the case of LinesRemaining, this number isdecremented for every line that is completed by the LBD.

[3053] If pass through is used during a band the PassThroughEnableregister needs to be programmed and PassThroughDotLength programmed withthe length of the compressed bits in pass through mode.

[3054] PrevLineSource is programmed during the initialisation of a band,if the previous line supplied for the first line is a valid previousline, a 1 is written to PrevLineSource so that the data is used. If a 0is written the LBD ignores the previous line information supplied andacts as if it is receiving all zeros for the previous line regardless ofwhat the out of the SFU is.

[3055] The ‘Registers and Resets’ sub-block also generates the resetsused by the rest of the LBD and the Go bit which tells the LBD that itcan start requesting data from the DIU and commence decoding of thecompressed data stream.

[3056] 24.3.6 Stream Decoder Sub-Block Description

[3057] The Stream Decoder reads the compressed bi-level image from theDRAM via the DIU (single accesses of 256-bits) into a double 256-bitFIFO. The barrel shift register uses the 64-bit word from the FIFO tofill up the empty space created by the barrel shift register as it isshifting it's contents. The bit stream is decoded into acommand/arguments pair, which in turn is passed to the commandcontroller.

[3058] A dataflow block diagram of the stream decoder is shown in FIG.149.

[3059] 24.3.6.1 DecodeC—Decode Command

[3060] The DecodeC logic encodes the command from bits 6..0 of the bitstream to output one of three commands: SKIP, VERTICAL and RUNLENGTH. Italso provides an output to indicate how many bits were consumed, whichfeeds back to the barrel shift register.

[3061] There is a fourth command, PASS_THROUGH, which is not encoded inbits 6..0, instead it is inferred in a special runlength. If the streamdecoder detects a short runlength value, i.e. a number less than 31,encoded as a medium runlength this tell the Stream Decoder that once thehorizontal command containing this runlength is decoded completely theLBD enters PASS_THROUGH mode. Following the runlength there will be anumber of bits that represent un-compressed data. The LBD will stay inPASS_THROUGH mode until all these bits have been decoded successfully,this will occur once a programmed number of bits is reached or the lineends, which ever comes first.

[3062] 24.3.6.2 DecodeD—Decode Delta

[3063] The DecodeD logic decodes the run length from bits 20..3 of thebit stream. If DecodeC is decoding a vertical command, it will causeDecodeD to put constants of −3 through 3 on its output. The output deltais a 15 bit number, which is generally considered to be positive, butsince it needs to only address to 13824 dots for an A4 page and 19488dots for an A3 page (of 32,768), a 2's complement representation of −3,−2, −1 will work correctly for the data pipeline that follows. This unitalso outputs how many bits were consumed.

[3064] In the case of PASS_THROUGH mode, DecodeD parses the bits thatrepresent the un-compressed data and this is used by the Line Fill Unitto construct the current line frame.

[3065] DecodeD parses the bits at one bit per clock cycle and passes thebit in the less significant bit location of delta to the line fill unit.

[3066] DecodeD currently requires to know the color of the run length todecode it correctly as black and white runs are encoded differently. Thestream decoder keeps track of the next color based on the current colorand the current command.

[3067] 24.3.6.3 State-Machine

[3068] This state machine continuously fetches consecutive DRAM datawhenever there is enough free space in the FIFO, thereby keeping thebarrel shift register full so it can continually decode commands for thecommand controller. Note in FIG. 149 that each read cycle curr_read_addris compared to end_of_band_store. If the two are equal, curr_read_addris loaded with start_of_band_store (circular memory addressing).Otherwise curr_read_addr is simply incremented. start_of_band_store andend_of_band_store need to be programed so that the distance between themis a multiple of the 256-bit DRAM word size.

[3069] When the state machine decodes a SKIP command, the state machineprovides two SKIP instructions to the command controller.

[3070] The RUNLENGTH command has two different run lengths. The two runlengths are passed to the command controller as separate RUNLENGTHinstructions. In the first instruction fetch, the first run length ispassed, and the state machine selects the DecodeD shift value for thebarrel shift. In the second instruction fetch from the commandcontroller another RUNLENGTH instruction is generated and the respectiveshift value is decoded. This is achieved by forcing DecodeC to output asecond RUNLENGTH instruction and the respective shift value is decoded.

[3071] For PASS_THROUGH mode, the PASS THROUGH command is issued everytime the command controller requests a new command. It does this untilall the un-compressed bits have been processed.

[3072] 24.3.7 Command Controller Sub-Block Description

[3073] The Command Controller interprets the command from the StreamDecoder and provides the line fill unit with a limit address and colorto fill the SFU Next Line Buffer. It provides the next edge unit with astarting address to look for the next edge and is responsible fordetecting the end of line and generating the eob_cc signal that ispassed to the line fill unit.

[3074] A dataflow block diagram of the command controller is shown inFIG. 150. Note that data names such as a0 and b1p are taken from [22],and they denote the reference or starting changing element on the codingline and the first changing element on the reference line to the rightof a0 and of the opposite color to a0 respectively.

[3075] 24.3.7.1 State Machine

[3076] The following is an explanation of all the states that the statemachine utilizes.

[3077] i Start

[3078] This is the state that the Command Controller enters when a hardor soft reset occurs or when Go has been de-asserted. This state cannotbe left until the reset has been removed, Go has been asserted and theNEU (Next Edge Unit), the SD (Stream Decoder) and the SFU are ready.

[3079] ii AWAIT_BUFFER

[3080] The NEU contains a buffer memory for the data it receives fromthe SFU. When the command controller enters this state the NEU detectsthis and starts buffering data, the command controller is able to leavethis state when the state machine in the NEU has entered the NEU_RUNNINGstate. Once this occurs the command controller can proceed to the PARSEstate.

[3081] iii PAUSE_CC

[3082] During the decode of a line it is possible for the FIFO in thestream decoder to get starved of data if the DRAM is not able to supplyreplacement data fast enough. Additionally the SFU can also stallmid-line due to band processing latency. If either of these cases occursthe LBD needs to pause until the stream decoder gets more of thecompressed data stream from the DRAM or the SFU can receive or delivernew frames. All of the remaining states check if sdvalid goes to zero(this denotes a starving of the stream decoder) or if sfu_lbd_rdy goesto zero and that the LBD needs to pause. PAUSE_CC is the state that thecommand controller enters to achieve this and it does not leave thisstate until sdvalid and sfu_lbd_rdy are both asserted and the LBD canrecommence decompressing.

[3083] iv PARSE

[3084] Once the command controller enters the PARSE state it uses theinformation that is supplied by the stream decoder. The first clockcycle of the state sees the sdack signal getting asserted informing thestream decoder that the current register information is being used sothat it can fetch the next command.

[3085] When in this state the command controller can receive one of fourvalid commands:

[3086] a) Runlength or Horizontal

[3087] For this command the value given as delta is an integer thatdenotes the number of bits of the current color that must be added tothe current line.

[3088] Should the current line position, a0, be added to the delta andthe result be greater than the final position of the current frame beingprocessed by the Line Fill Unit (only 16 bits at a time), it isnecessary for the command controller to wait for the Line Fill Unit(LFU) to process up to that point. The command controller changes intothe WAIT_FOR_RUNLENGTH state while this occurs.

[3089] When the current line position, a1, and the delta together equalor exceed the LINE_LENGTH, which is programmed during initialisation,then this denotes that it is the end of the current line. The commandcontroller signals this to the rest of the LBD and then returns to theSTAR_(T) state.

[3090] b) Vertical

[3091] When this command is received, it tells the command controllerthat, in the previous line, it needs to find a change from the currentcolor to opposite of the current color, i.e. if the current color iswhite it looks from the current position in the previous line for thenext time where there is a change in color from white to black. It isimportant to note that if a black to white change occurs first it isignored.

[3092] Once this edge has been detected, the delta will denote which ofthe vertical commands to use, refer to Table. The delta will denotewhere the changing element in the current line is relative to thechanging element on the previous line, for a Vertical(2) the newchanging element position in the current line will correspond to the twobits extra from changing element position in the previous line.

[3093] Should the next edge not be detected in the current frame underreview in the NEU, then the command controller enters the WAIT_FOR_NEstate and waits there until the next edge is found.

[3094] c) Skip

[3095] A skip follow the same functionality as to Vertical(0) commandsbut the color in the current line is not changed as it is been filledout. The stream decoder supplies what looks like two separate skipcommands that the command controller treats the same a two Vertical(0)commands and has been coded not to change the current color in thiscase.

[3096] d) Pass Through

[3097] When in pass through mode the stream decoder supplies one bit perclock cycle that is uses to construct the current frame. Once passthrough mode is completed, which is controlled in the stream decoder,the LBD can recommence normal decompression again. The current colorafter pass through mode is the same color as the last bit inun-compressed data stream. Pass through mode does not need an extrastate in the command controller as each pass through command receivedfrom the stream decoder can always be processed in one clock cycle.

[3098] v WAIT_FOR_RUNLENGTH

[3099] As some RUNLENGTH's can carry over more than one 16-bit frame,this means that the Line Fill Unit needs longer than one clock cycle towrite out all the bits represented by the RUNLENGTH.

[3100] After the first clock cycle the command controller enters intothe WAIT_FOR_RUNLENGTH state until all the RUNLENGTH data has beenconsumed. Once finished and provided it is not the end of the line thecommand controller will return to the PARSE state.

[3101] vi WAIT_FOR_NE

[3102] Similar to the RUNLENGTH commands the vertical commands cansometimes not find an edge in the current 16-bit frame. After the firstclock cycle the command controller enters the WAIT_FOR_NE state andremains here until the edge is detected. Provided it is not the end ofthe line the command controller will return to the PARSE state.

[3103] vii FINISH_LINE

[3104] At the end of a line the command controller needs to hold itsdata for the SFU before going back to the STAR_(T) state. Commandcontroller remains in the FINISH_LINE state for one clock cycle toachieve this.

[3105] 24.3.8 Next Edge Unit Sub-Block Description

[3106] The Next Edge Unit (NEU) is responsible for detecting colorchanges, or edges, in the previous line based on the current address andcolor supplied by the Command Controller. The NEU is the interface tothe SFU and it buffers the previous line for detecting an edge. For anedge detect operation the Command Controller supplies the currentaddress, this typically was the location of the last edge, but it couldalso be the end of a run length. With the current address a color isalso supplied and using these two values the NEU will search theprevious line for the next edge. If an edge is found the NEU returnsthis location to the Command Controller as the next address in thecurrent line and it sets a valid bit to tell the Command Controller thatthe edge has been detected. The Line Fill Unit uses this result toconstruct the current line. The NEU operates on 16-bit words and it ispossible that there is no edge in the current 16 bits in the NEU. Inthis case the NEU will request more words from the SFU and will keepsearching for an edge. It will continue doing this until it finds anedge or reaches the end of the previous line, which is based on theLINE_LENGTH. A dataflow block diagram of the Next Edge unit is shown inFIG. 152.

[3107] 24.3.8.1 NEU Buffer

[3108] The algorithm being employed for decompression is based on thewhole previous line and is not delineated during the line. However theNext Edge Unit, NEU, can only receive 16 bits at a time from the SFU.This presents a problem for vertical commands if the edge occurs in thesuccessive frame, but refers to a changing element in the current frame.

[3109] To accommodate this the NEU works on two frames at the same time,the current frame and the first 3 bits from the successive frame. Thisallows for the information that is needed from the previous line toconstruct the current frame of the current line.

[3110] In addition to this buffering there is also buffering right afterthe data is received from the SFU as the SFU output is not registered.The current implementation of the SFU takes two clock cycles from when arequest for a current line is received until it is returned andregistered. However when NEU requests a new frame it needs it on thenext clock cycle to maintain a decoded rate of 2 bits per clock cycle. Amore detailed diagram of the buffer in the NEU is shown in FIG. 153.

[3111] The output of the buffer are two 16-bit vectors, use_prev_line_aand use_prev_line_b, that are used to detect an edge that is relevant tothe current line being put together in the Line Fill Unit.

[3112] 24.3.8.2 NEU Edge Detect

[3113] The NEU Edge Detect block takes the two 16 bit vectors suppliedby the buffer and based on the current line position in the currentline, a0, and the current color, sd_color, it will detect if there is anedge relevant to the current frame. If the edge is found it supplies thecurrent line position, b1p, to the command controller and the line fillunit. The configuration of the edge detect is shown in FIG. 154.

[3114] The two vectors from the buffer, use_prev_line_a anduse_prev_line_b, pass into two sub-blocks, transition_wtob andtransition_btow. transition_wtob detects if any white to blacktransitions occur in the 19 bit vector supplied and outputs a 19-bitvector displaying the transitions. transition_wtob is functionally thesame as transition_btow, but it detects white to black transitions.

[3115] The two 19-bit vectors produced enter into a multiplexer and theoutput of the multiplexer is controlled by color_neu. color_neu is thecurrent edge transition color that the edge detect is searching for.

[3116] The output of the multiplexer is masked against a 19-bit vector,the mask is comprised of three parts concatenated together:decode_b_ext, decode_b and FIRST_FLU_WRITE.

[3117] The output of transition_wtob (and it complement transition_btow)are all the transitions in the 16 bit word that is under review. Thedecode_b is a mask generated from a0. In bit-wise terms all the bitsabove and including a0 are 1's and all bits below a0 are 0's. When theyare gated together it means that all the transitions below a0 areignored and the first transition after a0 is picked out as the nextedge.

[3118] The decode_b block decodes the 4 lsb of the current address (a0)into 16-bit mask bits that control which of the data bits are examined.Table 158 shows the truth table for this block. TABLE 158 Decode_b truthtable input output 0000 1111111111111111 0001 1111111111111110 00101111111111111100 0011 1111111111111000 0100 1111111111110000 01011111111111100000 0110 1111111111000000 0111 1111111110000000 10001111111100000000 1001 1111111000000000 1010 1111110000000000 10111111100000000000 1100 1111000000000000 1101 1110000000000000 11101100000000000000 1111 1000000000000000

[3119] For cases when there is a negative vertical command from thestream decoder it is possible that the edge is in the three lowersignificant bits of the next frame. The decode_b_ext block supplies themask so that the necessary bits can be used by the NEU to detect an edgeif present, Table 159 shows the truth table of this block. TABLE 159Decode_b_ext truth table delta output Vertical(−3) 111 Vertical(−2) 111Vertical(−1) 011 OTHERS 001

[3120] FIRST_FLU_WRITE is only used in the first frame of the currentline. 2.2.5 a) in [22] refers to “Processing the first picture element”,in which it states that “The first starting picture element, a0, on eachcoding line is imaginarily set at a position just before the firstpicture element, and is regarded as a white picture element”.transition_wtob and transition_btow are set up produce this case forevery single frame. However it is only used by the NEU if it is notmasked out. This occurs when FIRST_FLU_WRITE is ‘1’ which is onlyasserted at the beginning of a line. 2.2.5 b) in [22] covers the case of“Processing the last picture element”, this case states that “The codingof the coding line continues until the position of the imaginarychanging element situated after the last actual element is coded”. Thismeans that no matter what the current color is the NEU needs to alwaysfind an edge at the end of a line. This feature is used with negativevertical commands.

[3121] The vector, end_frame, is a “one-hot” vector that is assertedduring the last frame. It asserts a bit in the end of line position, asdetermined by LineLength, and this simulates an edge in this locationwhich is ORed with the transition's vector. The output of this,masked_data, is sent into the encodeB_one_hot block

[3122] 24.3.8.3 Encode_b_one_hot

[3123] The encode_b_one_hot block is the first stage of a two stageprocess that encodes the data to determine the address of the 0 to 1transition. Table 160 lists the truth table outlining the functionallyrequired by this block. TABLE 160 Encode_b_one_hot Truth Table Inputoutput XXXXXXXXXXXXXXXXXX1 0000000000000000001 XXXXXXXXXXXXXXXXX100000000000000000010 XXXXXXXXXXXXXXXX100 0000000000000000100XXXXXXXXXXXXXXX1000 0000000000000001000 XXXXXXXXXXXXXX100000000000000000010000 XXXXXXXXXXXXX100000 0000000000000100000XXXXXXXXXXXX1000000 0000000000001000000 XXXXXXXXXXX100000000000000000010000000 XXXXXXXXXX100000000 0000000000100000000XXXXXXXXX1000000000 0000000001000000000 XXXXXXXX100000000000000000010000000000 XXXXXXX100000000000 0000000100000000000XXXXXX1000000000000 0000001000000000000 XXXXX100000000000000000010000000000000 XXXX100000000000000 0000100000000000000XXX1000000000000000 0001000000000000000 XX100000000000000000010000000000000000 X100000000000000000 01000000000000000001000000000000000000 1000000000000000000 00000000000000000000000000000000000000

[3124] The output of encode_b_one-hot is a “one-hot” vector that willdenote where that edge transition is located. In cases of multipleedges, only the first one will be picked.

[3125] 24.3.8.4 Encode_b_(—)4 Bit

[3126] Encode_b_(—)4 bit is the second stage of the two stage processthat encodes the data to determine the address of the 0 to 1 transition.

[3127] Encode_b_(—)4 bit receives the “one-hot” vector fromencode_b_one_hot and determines the bit location that is asserted. Ifthere is none present this means that there was no edge present in thisframe. If there is a bit asserted the bit location in the vector isconverted to a number, for example if bit 0 is asserted then the numberis one, if bit one is asserted then the number is one, etc. The deltasupplied to the NEU determines what vertical command is being processed.The formula that is implemented to return b1p to the command controlleris:  for V(n) b 1 p=x+n modulus16

[3128] where x is the number that was extracted from the “one-hot”vector and n is the vertical command.

[3129] 24.3.8.5 State Machine

[3130] The following is an explanation of all the states that the NEUstate machine utilizes.

[3131] i NEU_START

[3132] This is the state that NEU enters when a hard or soft resetoccurs or when Go has been de-asserted. This state can not left untilthe reset has been removed, Go has been asserted and it detects that thecommand controller has entered it's AWAIT_BUFF state. When this occursthe NEU enters the NEU_FILL_BUFF state.

[3133] ii NEU_FILL_BUFF

[3134] Before any compressed data can be decoded the NEU needs to fillup its buffer with new data from the SFU. The rest of the LBD waitswhile the NEU retrieves the first four frames from the previous line.Once completed it enters the NEU_HOLD state.

[3135] iii NEU_HOLD

[3136] The NEU waits in this state for one clock cycle while datarequested from the SFU on the last access returns.

[3137] iv NEU_RUNNING

[3138] NEU_RUNNING controls the requesting of data from the SFU for theremainder of the line by pulsing lbd_sfu_pladvword when the LBD needs anew frame from the SFU. When the NEU has received all the word it needsfor the current line, as denoted by the LineLength, the NEU enters theNEU_EMPTY state.

[3139] v NEU_EMPTY

[3140] NEU waits in this state while the rest of the LBD finishesoutputting the completed line to the SFU. The NEU leaves this state whenGo gets deasserted. This occurs when the end_of_line signal is detectedfrom the LBD.

[3141] 24.3.9 Line Fill Unit Sub-Block Description

[3142] The Line Fill Unit, LFU, is responsible for filling the next linebuffer in the SFU. The SFU receives the data in blocks of sixteen bits.The LFU uses the color and a0 provided by the Command Controller andwhen it has put together a complete 16-bit frame, it is written out tothe SFU. The LBD signals to the SFU that the data is valid by strobingthe lbd_sfu_wdatavalid signal.

[3143] When the LFU is at the end of the line for the current line datait strobes lbd_sfu_advline to indicate to the SFU that the end of theline has occurred.

[3144] A dataflow block diagram of the line fill unit is shown in FIG.154.

[3145] The dataflow above has the following blocks:

[3146] 24.3.9.1 State Machine

[3147] The following is an explanation of all the states that the LFUstate machine utilizes.

[3148] i LFU_START

[3149] This is the state that the LFU enters when a hard or soft resetoccurs or when Go has been de-asserted. This state can not left untilthe reset has been removed, Go has been asserted and it detects that a0is no longer zero, this only occurs once the command controller startprocessing data from the Next Edge Unit, NEU.

[3150] ii LFU_NEW REG

[3151] LFU_NEW_REG is only entered at the beginning of a new frame. Itcan remain in this state on subsequent cycles if a whole frame iscompleted in one clock cycle. If the frame is completed the LFU willoutput the data to the SFU with the write enable signal. However if aframe is not completed in one clock cycle the state machine will changeto the LFU_COMPLETE_REG state to complete the remainder of the frame.LFU_NEW_REG handles all the lbd_sfu_wdata writes and assertslbd_sfu_wdatavalid as necessary.

[3152] iii LFU_COMPLETE_REG

[3153] LFU_COMPLETE_REG fills out all the remaining parts of the framethat were not completed in the first clock cycle. The command controllersupplies the a0 value and the color and the state machine uses these toderive the limit and color_sel_(—)16 bit_If which the line_fill_datablock needs to construct a frame. Limit is the four lower significantbits of a0 and color_sel_(—)16 bit_If is a 16-bit wide mask of sd_dolor.The state machine also maintains a check on the upper eleven bits of a0.If these increment from one clock cycle to the next that means that aframe is completed and the data can be written to the SFU. In the caseof the LineLength being reached the Line Fill Unit fills out theremaining part of the frame with the color of the last bit in the linethat was decoded.

[3154] 24.3.9.2 line_fill_data

[3155] line_fill_data takes the limit value and the color_sel_(—)16bit_If values and constructs the current frame that the commandcontroller and the next edge unit are decoding. The following pseudocode illustrate the logic followed by the line_fill_data. work_sfu_wdatais exported by the LBD to the SFU as lbd_sfu_wdata.if (lfu_state = = LFU_START) OR (lfu_state = = LFU_NEW_REG) thenwork_sfu_wdata = color_sel_16bit_lf else work_sfu_wdata[(15 − limit)downto limit] = color_sel_16bit_lf[(15 − limit) downto limit]

[3156] 25 Spot FIFO Unit (SFU)

[3157] 25.1 Overview

[3158] The Spot FIFO Unit (SFU) provides the means by which data istransferred between the LBD and the HCU. By abstracting the bufferingmechanism and controls from both units, the interface is clean betweenthe data user and the data generator. The amount of buffering can alsobe increased or decreased without affecting either the LBD or HCU.Scaling of data is performed in the horizontal and vertical directionsby the SFU so that the output to the HCU matches the printer resolution.Non-integer scaling is supported in both the horizontal and verticaldirections. Typically, the scale factor will be the same in bothdirections but may be programmed to be different.

[3159] 25.2 Main Features of the SFU

[3160] The SFU replaces the Spot Line Buffer Interface (SLBI) in PEC1.The spot line store is now located in DRAM.

[3161] The SFU outputs the previous line to the LBD, stores the nextline produced by the LBD and outputs the HCU read line. Each interfaceto DRAM is via a feeder FIFO. The LBD interfaces to the SFU with a datawidth of 16 bits. The SFU interfaces to the HCU with a data width of 1bit. Since the DRAM word width is 256-bits but the LBD line length is amultiple of 16 bits, a capability to flush the last multiples of 16-bitsat the end of a line into a 256-bit DRAM word size is required.Therefore, SFU reads of DRAM words at the end of a line, which do notfill the DRAM word, will already be padded.

[3162] A signal sfu_lbd_rdy to the LBD indicates that the SFU isavailable for writing and reading. For the first LBD line after SFU Gohas been asserted, previous line data is not supplied until after thefirst lbd_sfu_advline strobe from the LBD (zero data is suppliedinstead), and sfu_lbd_rdy to the LBD indicates that the SFU is availablefor writing. lbd_sfu_advline tells the SFU to advance to the next line.lbd_sfu_pladvword tells the SFU to supply the next 16-bits of previousline data. Until the number of lbd_sfu_pladvword strobes received isequivalent to the LBD line length, sfu_lbd_rdy indicates that the SFU isavailable for both reading and writing. Thereafter it indicates the SFUis available for writing. The LBD should not generate lbd_sfu_pladvwordor lbd_sfu_advline strobes until sfu_lbd_rdy is asserted.

[3163] A signal sfu_hcu_avail indicates that the SFU has data to supplyto the HCU. Another signal hcu_sfu_advdot, from the HCU, tells the SFUto supply the next dot. The HCU should not generate the hcu_sfu_advdotsignal until sfu_hcu_avail is true. The HCU can therefore stall waitingfor the sfu_hcu_avail signal.

[3164] X and Y non-integer scaling of the bi-level dot data is performedin the SFU.

[3165] At 1600 dpi the SFU requires 1 dot per cycle for all DRAMchannels, 3 dots per cycle in total (read+read+write). Therefore the SFUrequires two 256 bit read DRAM access per 256 cycles, 1 write accessevery 256 cycles. A single DIU read interface will be shared for readingthe current and previous lines from DRAM.

[3166] 25.3 Bi-Level Dram Memory Buffer Between LBD, SFU and HCU

[3167]FIG. 158 shows a bi-level buffer store in DRAM. FIG. 158(a) showsthe LBD previous line address reading after the HCU read line address inDRAM. FIG. 158(b) shows the LBD previous line address reading before theHCU read line address in DRAM.

[3168] Although the LBD and HCU read and write complete lines of data,the bi-level DRAM buffer is not line based. The buffering between theLBD, SFU and HCU is a FIFO of programmable size. The only line basedconcept is that the line the HCU is currently reading cannot beover-written because it may need to be re-read for scaling purposes.

[3169] The SFU interfaces to DRAM via three FIFOs:

[3170] a. The HCUReadLineFIFO which supplies dot data to the HCU.

[3171] b. The LBDNextLineFIFO which writes decompressed bi-level datafrom the LBD.

[3172] c. The LBDPrevLineFIFO which reads previous decompressed bi-leveldata for the LBD.

[3173] There are four address pointers used to manage the bi-level DRAMbuffer:

[3174] a. hcu_readline_rd_adr[21:5] is the read address in DRAM for theHCUReadLineFIFO.

[3175] b. hcu_startreadline_adr[21:5] is the start address in DRAM forthe current line being read by the HCUReadLineFIFO.

[3176] c. lbd_nextline_wr_adr[21:5] is the write address in DRAM for theLBDNextLineFIFO.

[3177] d. lbd_prevline_rd_adr[21:5] is the read address in DRAM for theLBDPrevLineFIFO.

[3178] The address pointers must obey certain rules which indicatewhether they are valid:

[3179] a. hcu_readline_rd_adr is only valid if it is reading earlier inthe line than lbd_nextline_wr_adr is writing i.e. the fifo is not empty

[3180] b. The SFU (lbd_nextline_wr_adr) cannot overwrite the currentline that the HCU is reading from (hcu_startreadline_adr) i.e. the fifois not full, when compared with the HCU read line pointer

[3181] c. The LBDNextLineFIFO (lbd_nextline_wr_adr) must be writingearlier in the line than LBDPrevLineFIFO (lbd_prevline_rd_adr) isreading and must not overwrite the current line that the HCU is readingfrom i.e. the fifo is not full when compared to the PrevLineFifo readpointer

[3182] d. The LBDPrevLineFIFO (lbd_prevline_rd_adr) can read right up tothe address that LBDNextLineFIFO (lbd_nextline_wr_adr) is writing i.ethe fifo is not empty.

[3183] e. At startup i.e. when sfu_go is asserted, the pointers arereset to start_sfu_adr[21:5].

[3184] f. The address pointers can wrap around the SFU bi-level storearea in DRAM.

[3185] As a guideline, the typical FIFO size should be a minimum of 2lines stored in DRAM, nominally 3 lines, up to a programmable number oflines. A larger buffer allows lines to be decompressed in advance. Thiscan be useful for absorbing local complexities in compressed bi-levelimages.

[3186] 25.4 DRAM Access Requirements

[3187] The SFU has 1 read interface to the DIU and 1 write interface.The read interface is shared between the previous and current line readFIFOs.

[3188] The spot line store requires 5.1 Kbytes of DRAM to store 3 A4lines. The SFU will read and write the spot line store in single 256-bitDRAM accesses. The SFU will need 256-bit double buffers for each of itsprevious, current and next line interfaces.

[3189] The SFU's DIU bandwidth requirements are summarized in Table 161.TABLE 161 DRAM bandwidth requirements Maximum number of Peak Bandwidthcycles between required to be Average each 256-bit supported byBandwidth Direction DRAM access DIU (bits/cycle) (bits/cycle) Read 12812 2 Write 2562 1 1

[3190] 25.5 Scaling

[3191] Scaling of bi-level data is performed in both the horizontal andvertical directions by the SFU so that the output to the HCU matches theprinter resolution. The SFU supports non-integer scaling with the scalefactor represented by a numerator and a denominator. Only scaling up ofthe bi-level data is allowed, i.e. the numerator should be greater thanor equal to the denominator. Scaling is implemented using a counter asdescribed in the pseudocode below. An advance pulse is generated to moveto the next dot (x-scaling) or line (y-scaling). if (count +denominator >= numerator) then count = (count + denominator) − numeratoradvance = 1 else count = count + denominator advance = 0

[3192] X scaling controls whether the SFU supplies the next dot or acopy of the current dot when the HCU asserts hcu_sfu_advdot. The SFUcounts the number of hcu_sfu_advdot signals from the HCU. When the SFUhas supplied an entire HCU line of data, the SFU will either re-read thecurrent line from DRAM or advance to the next line of HCU read datadepending on the programmed Y scale factor.

[3193] An example of scaling for numerator=7 and denominator=3 is givenin Table 162. The signal advance if asserted causes the next input dotto be output on the next cycle, otherwise the same input dot is outputTABLE 162 Non-integer scaling example for scaleNum = 7, scaleDenom = 3count advance dot 0 0 1 3 0 1 6 1 1 2 0 2 5 1 2 1 0 3 4 1 3 0 0 4 3 0 46 1 4 2 0 5

[3194] 25.6 Lead-In and Lead-Out Clipping

[3195] To account for the case where there may be two SoPEC devices,each generating its own portion of a dot-line, the first dot in a linemay not be replicated the total scale-factor number of times by anindividual SoPEC. The dot will ultimately be scaled-up correctly withboth devices doing part of the scaling, one on its lead-out and theother on its lead in. Scaled up dots on the lead-out, i.e. which gobeyond the HCU linelength, will be ignored. Scaling on the lead-in, i.e.of the first valid dot in the line, is controlled by setting theXstartCount register.

[3196] At the start of each line count in the pseudo-code above is setto XstartCount. If there is no lead-in, XstartCount is set to 0 i.e. thefirst value of count in Table. If there is lead-in then XstartCountneeds to be set to the appropriate value of count in the sequence above.

[3197] 25.7 Interfaces Between LDB, SFU and HCU

[3198] 25.7;1 LDB-SFU Interfaces

[3199] The LBD has two interfaces to the SFU. The LBD writes the nextline to the SFU and reads the previous line from the SFU.

[3200] 25.7.1.1 LBDNextLineFIFO Interface

[3201] The LBDNextLineFIFO interface from the LBD to the SFU comprisesthe following signals:

[3202] lbd_sfu_wdata, 16-bit write data.

[3203] lbd_sfu_wdatavalID, write data valid.

[3204] lbd_sfu_advline, signal indicating LDB has advanced to the nextline.

[3205] The LBD should not write to the SFU until sfu_lbd_rdy is true.The LBD can therefore stall waiting for the sfu_lbd_rdy signal.

[3206] 25.7.1.2 LBDPrevLineFIFO Interface

[3207] The LBDPrevLineFIFO interface from the SFU to the LBD comprisesthe following signals:

[3208] sfu_lbd_pldata, 16-bit data.

[3209] The previous line read buffer interface from the LBD to the SDUcomprises the following signals:

[3210] lbd_sfu_pladvword, signal indicating to the SFU to supply thenext 16-bit word.

[3211] lbd_sfu_advline, signal indicating LDB has advanced to the nextline.

[3212] Previous line data is not supplied until after the firstlbd_sfu_advline strobe from the LBD (zero data is supplied instead). TheLBD should not assert lbd_sfu_pladvword unless sfu_lbd_rdy is asserted.

[3213] 25.7.1.3 Common Control Signals

[3214] sfu_lbd_rdy indicates to the LBD that the SFU is available forwriting. After the first lbd_sfu_advline and before the number oflbd_sfu_pladvword strobes received is equivalent to the LBD line length,sfu_lbd_rdy indicates that the SFU is available for both reading andwriting.

[3215] Thereafter it indicates the SFU is available for writing.

[3216] The LBD should not generate lbd_sfu_pladvword or lbd_sfu_advlinestrobes until sfu_lbd_rdy is asserted.

[3217] 25.7.2 SFU-HCU Current Line FIFO Interface

[3218] The interface from the SFU to the HCU comprises the followingsignals:

[3219] sfu_hcu_sdata, 1-bit data.

[3220] sfu_hcu_avail, data valid signal indicating that there is dataavailable in the SFU HCUReadLineFIFO.

[3221] The interface from HCU to SFU comprises the following signals:

[3222] hcu_sfu_advdot, indicating to the SFU to supply the next dot.

[3223] The HCU should not generate the hcu_sfu_advdot signal untilsfu_hcu_avail is true. The HCU can therefore stall waiting for thesfu_hcu_avail signal.

[3224] 25.8 Implementation

[3225] 25.8.1 Definitions of IO TABLE 163 SFU Port List Port Name PinsI/O Description Clocks and Resets Pclk 1 In SoPEC Functional clock.prst_n 1 In Global reset signal. DIU Read Interface signals sfu_diu_rreq1 Out SFU requests DRAM read. A read request must be accompanied by avalid read address. sfu_diu_radr[21:5] 17 Out Read address to DIU 17bits wide (256-bit aligned word). diu_sfu_rack 1 In Acknowledge from DIUthat read request has been accepted and new read address can be placedon sfu_diu_radr. diu_data[63:0] 64 In Data from DIU to SoPEC Units.First 64-bits are bits 63:0 of 256 bit word. Second 64-bits are bits127:64 of 256 bit word. Third 64-bits are bits 191:128 of 256 bit word.Fourth 64-bits are bits 255:192 of 256 bit word. diu_sfu_rvalid 1 InSignal from DIU telling SoPEC Unit that valid read data is on thediu_data bus. DIU Write Interface signals sfu_diu_wreq 1 Out SFUrequests DRAM write. A write request must be accompanied by a validwrite address together with valid write data and a write valid.sfu_diu_wadr[21:5] 17 Out Write address to DIU 17 bits wide (256-bitaligned word). diu_sfu_wack 1 In Acknowledge from DIU that write requesthas been accepted and new write address can be placed on sfu_diu_wadr.sfu_diu_data[63:0] 64 Out Data from SFU to DIU. First 64-bits are bits63:0 of 256 bit word. Second 64-bits are bits 127:64 of 256 bit word.Third 64-bits are bits 191:128 of 256 bit word. Fourth 64-bits are bits255:192 of 256 bit word. sfu_diu_wvalid 1 Out Signal from PEP Unitindicating that data on sfu_diu_data is valid. PCU Interface data andcontrol signals pcu_adr[5:2] 4 In PCU address bus. Only 4 bits arerequired to decode the address space for this block pcu_dataout[31:0] 32In Shared write data bus from the PCU sfu_pcu_datain[31:0] 32 Out Readdata bus from the SFU to the PCU pcu_rwn 1 In Common read/not-writesignal from the PCU pcu_sfu_sel 1 In Block select from the PCU. Whenpcu_sfu_sel is high both pcu_adr and pcu_dataout are valid sfu_pcu_rdy 1Out Ready signal to the PCU. When sfu_pcu_rdy is high it indicates thelast cycle of the access. For a write cycle this means pcu_dataout hasbeen registered by the block and for a read cycle this means the data onsfu_pcu_datain is valid. LBD Interface Data and Control Signalssfu_lbd_rdy 1 Out Signal indication that SFU has previous line dataavailable and is ready to be written to. lbd_sfu_advline 1 In Lineadvance signal for both next and previous lines. lbd_sfu_pladvword 1 InAdvance word signal for previous line buffer. sfu_lbd_pldata[15:0] 16Out Data from the previous line buffer. lbd_sfu_wdata[15:0] 16 In Writedata for next line buffer. lbd_sfu_wdatavalid 1 In Write data validsignal for next line buffer data. HCU Interface Data and Control Signalshcu_sfu_advdot 1 In Signal indicating to the SFU that the HCU is readyto accept the next dot of data from SFU. sfu_hcu_sdata 1 Out Bi-leveldot data. sfu_hcu_avail 1 Out Signal indicating valid bi-level dot dataon sfu_hcu_sdata.

[3226] 25.8.2 Configuration Registers TABLE 164 SFU ConfigurationRegisters value Address # on (SFU_base +) register name bits resetdescription Control registers 0x00 Reset 1 0x1 A write to this registercauses a reset of the SFU. This register can be read to indicate thereset state: 0 - reset in progress 1 - reset not in progress 0x04 Go 10x0 Writing 1 to this register starts the SFU. Writing 0 to thisregister halts the SFU. When Go is deasserted the state- machines go totheir idle states but all counters and configuration registers keeptheir values. When Go is asserted all counters are reset, butconfiguration registers keep their values (i.e. they don't get reset).The SFU must be started before the LBD is started. This register can beread to determine if the SFU is running (1 - running, 0 - stopped).Setup registers (constant for during processing the page) 0x08HCUNumDots 16 0x0000 Width of HCU line (in dots). 0x0C HCUDRAMWords 80x00 Number of 256-bit DRAM words in a HCU line − 1. 0x10 LBDDRAMWords 80x00 Number of 256-bit words in a LBD line − 1. (LBD line length must beat least 128 bits). 0x14 StartSfuAdr[21:5] 17 0x0000 0 First SFUlocation in memory. (256-bit aligned DRAM address) 0x18 EndSfuAdr[21:5]17 0x0000 0 Last SFU location in memory. (256-bit aligned DRAM address)0x1C XstartCount 8 0x00 Value to be loaded at the start of every lineinto the counter used for scaling in the X direction. Used to controlthe scaling of the first dot in a line. This value will typically equalzero, except in the case where a number of dots are clipped on the leadin to a line. XstartCount must be programmed to be less than theXscaleNum value. 0x20 XscaleNum 8 0x01 Numerator of spot data scalefactor in X direction. 0x24 XscaleDenom 8 0x01 Denominator of spot datascale factor in X direction. 0x28 YscaleNum 8 0x01 Numerator of spotdata scale factor in Y direction. 0x2C YscaleDenom 8 0x01 Denominator ofspot data scale factor in Y direction. Work registers (PCU has read-onlyaccess) 0x30 HCUReadLineAdr[21:5] 17 — Current address pointer in DRAMto (256-bit aligned HCU read data. Read only register. DRAM address)0x34 HCUStartReadLineAdr[21:5] 17 — Start address in DRAM of line being(256-bit aligned read by HCU buffer in DRAM. Read DRAM address) onlyregister. 0x38 LBDNextLineAdr[21:5] 17 — Current address pointer in DRAMto (256-bit aligned LBD write data. Read only register DRAM address)0x3C LBDPrevLineAdr[21:5] 17 — Current address pointer in DRAM to(256-bit aligned LBD read data. Read only register DRAM address)

[3227] 25.8.3 SFU Sub-Block Partition

[3228] The SFU contains a number of sub-blocks: Name description PCU PCUinterface, configuration and status Interface registers. Also generatesthe Go and the Reset signals for the rest of the SFU LBD Contains FIFOwhich is read by the LBD Previous previous line interface. Line FIFO LBDNext Contains FIFO which is written by the LBD Line FIFO next lineinterface. HCU Read Contains FIFO which is read by the HCU Line FIFOinterface. DIU Contains DIU read interface and DIU write Interfaceinterface. Manages the address pointers for and Address the bi-levelDRAM buffer. Contains X and Y Generator scaling logic.

[3229] The various FIFO sub-blocks have no knowledge of where in DRAMtheir read or write data is stored. In this sense the FIFO sub-blocksare completely de-coupled from the bi-level DRAM buffer. All DRAMaddress management_is centralised in the DIU Interface and AddressGeneration sub-block. DRAM access is pre-emptive i.e. after a FIFO unithas made an access then as soon as the FIFO has space to read or data towrite a DIU access will be requested immediately. This ensures there areno unnecessary stalls introduced e.g. at the end of an LBD or HCU line.

[3230] There now follows a description of the SFU sub-blocks.

[3231] 25.8.4 PCU Interface Sub-Block

[3232] The PCU interface sub-block provides for the CPU to access SFUspecific registers by reading or writing to the SFU address space.

[3233] 25.8.5 LBDPrevLineFIFO Sub-Block TABLE 165 LBDPrevLineFIFOAdditional IO Definitions Port Name Pins I/O Description Internal Outputplf_rdy 1 Out Signal indicating LBDPrevLineFIFO is ready to be readfrom. Until the first lbd_sfu_advline for a band has been received andafter the number of reads from DRAM for a line is received is equal toLBDDRAMWords, plf_rdy is always asserted. During the second andsubsequent lines plf_rdy is deasserted whenever the LBDPrevLineFIFO hasone word left in the FIFO.. DIU and Address Generation sub-block Signalsplf_diurreq 1 Out Signal indicating the LBDPrevLineFIFO has 256-bits ofdata free. plf_diurack 1 In Acknowledge that read request has beenaccepted and plf_diurreq should be de-asserted. plf_diurdata 1 In Datafrom the DIU to LBDPrevLineFIFO. First 64-bits are bits 63:0 of 256 bitword. Second 64-bits are bits 127:64 of 256 bit word. Third 64-bits arebits 191:128 of 256 bit word. Fourth 64-bits is are 255:192 of 256 bitword. plf_diurrvalid 1 In Signal indicating data on plf_diurdata isvalid. plf_diuidle 1 Out Signal indicating DIU state-machine is in theIDLE state.

[3234] 25.8.5.1 General Description

[3235] The LBDPrevLineFIFO sub-block comprises a double 256-bit bufferbetween the LBD and the DIU Interface and Address Generator sub-block.The FIFO is implemented as 8 times 64-bit words. The FIFO is written bythe DIU Interface and Address Generator sub-block and read by the LBD.

[3236] Whenever 4 locations in the FIFO are free the FIFO will request256-bits of data from the DIU Interface and Address Generation sub-blockby asserting plf_diurreq. A signal plf_diurack indicates that therequest has been accepted and plf_diurreq should be de-asserted.

[3237] The data is written to the FIFO as 64-bits on plf_diurdata[63:0]over 4 clock cycles. The signal plf_diurvalid indicates that the datareturned on plf_diurdata[63:0] is valid. plf_diurvalid is used togenerate the FIFO write enable, write_en, and to increment the FIFOwrite address, write_adr[2:0]. If the LBDPrevLineFIFO still has 256-bitsfree then plf_diurreq should be asserted again.

[3238] The DIU Interface and Address Generation sub-block handles alladdress pointer management and DIU interfacing and decides whether toacknowledge a request for data from the FIFO.

[3239] The state diagram of the LBDPrevLineFIFO DIU Interface is shownin FIG. 163. If sfu_go is deasserted then the state-machine returns toits idle state.

[3240] The LBD reads 16-bit wide data from the LBDPrevLineFIFO onsfu_lbd_pldata[15:0]. lbd_sfu_pladvword from the LBD tells theLBDPrevLineFIFO to supply the next 16-bit word. The FIFO control logicgenerates a signal word_select which selects the next 16-bits of the64-bit FIFO word to output on sfu_lbd_pldata[15:0]. When the entirecurrent 64-bit FIFO word has been read by the LBD lbd_sfu_pladvword willcause the next word to be popped from the FIFO. Previous line data isnot supplied until after the first lbd_sfu_advline strobe from the LBDafter sfu_go is asserted (zero data is supplied instead). Until thefirst lbd_sfu_advline strobe after sfu_go lbd_sfu_pladvword strobes areignored.

[3241] The LBDPrevLineFIFO control logic uses a counter, pl_count[7:0],to counts the number of DRAM read accesses for the line. When thepl_count counter is equal to the LBDDRAMWords, a complete line of datahas been read by the LBD the plf_rdy is set high, and the counter isreset. It remains high until the next lbd_sfu_advline strobe from theLBD. On receipt of the lbd_sfu_advline strobe the remaining data in the256-bit word in the FIFO is ignored, and the FIFO read_adr is rounded upif required.

[3242] The LBDPrevLineFIFO generates a signal plf_rdy to indicate thatit has data available. Until the first lbd_sfu_advline for a band hasbeen received and after the number of DRAM reads for a line is equal toLBDDRAMWords, plf_rdy is always asserted. During the second andsubsequent lines plf_rdy is deasserted whenever the LBDPrevLineFIFO hasone word left.

[3243] The last 256-bit word for a line read from DRAM can contain extrapadding which should not be output to the LBD. This is because thenumber of 16-bit words per line may not fit exactly into a 256-bit DRAMword. When the count of the number of DRAM reads for a line is equal tolbd_dram_words the LBDPrevLineFIFO must adjust the FIFO write address topoint to the next 256-bit word boundary in the FIFO for the next line ofdata. At the end of a line the read address must round up the nearest256-bit word boundary and ignore the remaining 16-bit words. This can beachieved by considering the FIFO read address, read adr[2:0], willrequire 3 bits to address 8 locations of 64-bits. The next 256-bitaligned address is calculated by inverting the MSB of the read_adr andsetting all other bits to 0. if (read_adr[1:0] /= b00 ANDlbd_sfu_advline = = l)then read_adr[1:0] = b00 read_adr[2] =˜read_adr[2]

[3244] 25.8.6 LBDNextLineFIFO Sub-Block TABLE 166 LBDNextLineFIFOAdditional IO Definition Port Name Pins I/O Description LBDNextLineFIFOInterface Signals nlf_rdy 1 Out Signal indicating LBDNextLineFIFO isready to be written to i.e. there is space in the FIFO. DIU and AddressGeneration sub-block Signals nlf_diuwreq 1 Out Signal indicating theLBDNextLineFIFO has 256-bits of data for writing to the DIU. nlf_diuwack1 In Acknowledge from DIU that write request has been accepted and writedata can be output on nlf_diuwdata together with nlf_diuwvalid.nlf_diuwdata 1 Out Data from LBDNextLineFIFO to DIU Interface. First64-bits is bits 63:0 of 256 bit word Second 64-bits is bits 127:64 of256 bit word Third 64-bits is bits 191:128 of 256 bit word Fourth64-bits is bits 255:192 of 256 bit word nlf_diuwvalid 1 In Signalindicating that data on wlf_diuwdata is valid.

[3245] 25.8.6.1 General Description

[3246] The LBDNextLineFIFO sub-block comprises a double 256-bit bufferbetween the LBD and the DIU Interface and Address Generator sub-block.The FIFO is implemented as 8 times 64-bit words. The FIFO is written bythe LBD and read by the DIU Interface and Address Generator. Whenever 4locations in the FIFO are full the FIFO will request 256-bits of data tobe written to the DIU Interface and Address Generator by assertingnlf_diuwreq. A signal nlf_diuwack indicates that the request has beenaccepted and nlf_diuwreq should be de-asserted. On receipt ofnlf_diuwack, the data is sent to the DIU Interface as 64-bits onnlf_diuwdata[63:0] over 4 clock cycles. The signal nlf_diuwvalidindicates that the data on nlf_diuwdata[63:0] is valid. nlf_diuwvalidshould be asserted with the smallest latency after nlf_diuwack. If theLBDNextLineFIFO still has 256-bits more to transfer then nlf_diuwreqshould be asserted again. The state diagram of the LBDNextLineFIFO DIUInterface is shown in FIG. 166. If sfu_go is deasserted then thestate-machine returns to its Idle state.

[3247] The signal nlf_rdy indicates that the LBDNextLineFIFO has spacefor writing by the LBD. The LBD writes 16-bit wide data supplied onlbd_sfu_wdata[15:0]. lbd_sfu_wvalid indicates that the data is valid.

[3248] The LBDNextLineFIFO control logic counts the number oflbd_sfu_wvalid signals and is used to correctly address into the nextline FIFO. The lbd_sfu_wvalid counter is rounded up to the nearest256-bit word when a lbd_sfu_advline strobe is received from the LBD. Anydata remaining in the FIFO is flushed to DRAM with padding being addedto fill a complete 256-bit word.

[3249] 25.8.7 sfu_lbd_rdy Generation

[3250] The signal sfu_lbd_rdy is generated by ANDing plf_rdy from theLBDPrevLineFIFO and nlf_rdy from the LBDNextLineFIFO.

[3251] sfu_lbd_rdy indicates to the LBD that the SFU is available forwriting i.e. there is space available in the LBDNextLineFIFO. After thefirst ibd_sfu_advline and before the number of lbd_sfu_pladvword strobesreceived is equivalent to the line length, sfu_lbd_rdy indicates thatthe SFU is available for both reading, i.e. there is data in theLBDPrevLineFIFO, and writing.

[3252] Thereafter it indicates the SFU is available for writing.

[3253] 25.8.8 LBD-SFU Interfaces Timing Waveform Description

[3254] In FIG. 167 and FIG. 168, shows the timing of the data valid andready signals between the SFU and LBD. A diagram and pseudocode is givenfor both read and write interfaces between the SFU and LBD.

[3255] 25.8.8.1 LBD-SFU Write Interface Timing

[3256] The main points to note from FIG. 167 are:

[3257] In clock cycle 1 sfu_lbd_rdy detects that it has only space toreceive 2 more 16 bit words from the LBD after the current clock cycle.

[3258] The data on lbd_sfu_wdata is valid and this is indicated bylbd_sfu_wdatavalid being asserted.

[3259] In clock cycle 2 sfu_lbd_rdy is deasserted however the LBD cannot react to this signal until clock cycle 3. So in clock cycle 3 thereis also valid data from the LBD which consumes the last availablelocation available in the FIFO in the SFU (FIFO free level is zero).

[3260] In clock cycle 4 and 5 the FIFO is read and 2 words become freein the FIFO.

[3261] In cycle 4 the SFU determines that the FIFO has more room andasserts the ready signal on the next cycle.

[3262] The LBD has entered a pause mode and waits for sfu_lbd_rdy to beasserted again, in cycle 5 the LBD sees the asserted ready signal andresponds by writing one unit into the FIFO, in cycle 6.

[3263] The SFU detects it has 2 spaces left in the FIFO and the currentcycle is an active write (same as in cycle 1), and deasserts the readyon the next cycle.

[3264] In cycle 7 the LBD did not have data to write into the FIFO, andso the FIFO remains with one space left

[3265] The SFU toggles the ready signal every second cycle, this allowsthe LBD to write one unit at a time to the FIFO.

[3266] In cycle 9 the LBD responds to the single ready pulse by writinginto the FIFO and consuming the last remaining unit free.

[3267] The write interface pseudocode for generating the ready is. //ready generation pseudocode if (fifo_free_level > 2) then nlf_rdy = 1elsif (fifo_free_level = = 2) then if (lbd_sfu_wdatavalid = = 1)thennlf_rdy = 0 else nlf_rdy = 1 elsif (fifo_free_level = = 1) then if(lbd_sfu_wdatavalid = = 1)then nlf_rdy = 0 else nlf_rdy =NOT(sfu_lbd_rdy) else nlf_rdy  = 0 sfu_lbd_rdy = (nlf_rdy AND plf_rdy)

[3268] 25.8.8.2 SFU-LBD Read Interface

[3269] The read interface is similar to the write interface except thatread data (sfu_lbd_pldata) takes an extra cycle to respond to the dataadvance signal (lbd_sfu_pladvword signal).

[3270] It is not possible to read the FIFO totally empty during theprocessing of a line, one word must always remain in the FIFO. At theend of a line the fifo can be read to totally empty. This functionalityis controlled by the SFU with the generation of the plf_signal.

[3271] There is an apparent corner case on the read side which should behighlighted. On examination this turns out to not be an issue.

[3272] Scenario 1:

[3273] sfu_lbd_rdy will go low when there is still is still 2 pieces ofdata in the FIFO. If there is a lbd_sfu_pladvword pulse in the nextcycle the data will appear on sfu_lbd_pldata[15:0].

[3274] Scenario 2:

[3275] sfu_lbd_rdy will go low when there is still 2 pieces of data inthe FIFO. If there is no lbd_sfu_pladvword pulse in the next cycle andit is not the end of the page then the SFU will read the data for thenext line from DRAM and the read FIFO will fill more, sfu_lbd_rdy willassert again, and so the data will appear on sfu_lbd_pldata[15:0]. If ithappens that the next line of data is not available yet thesfu_lbd_pldata bus will go invalid until the next lines data isavailable. The LBD does not sample the sfu_lbd_pldata bus at this time(i.e. after the end of a line) and it is safe to have invalid data onthe bus.

[3276] Scenario 3:

[3277] sfu_lbd_rdy will go low when there is still 2 pieces of data inthe FIFO. If there is no lbd_sfu_pladvword pulse in the next cycle andit is the end of the page then the SFU will do no more reads from DRAM,sfu_lbd_rdy will remain de-asserted, and the data will not be read outfrom the FIFO. However last line of data on the page is not needed fordecoding in the LBD and will not be read by the LBD. So scenario 3 willnever apply.

[3278] The pseudocode for the read FIFO ready generation // readygeneration pseudocode if (pl_count = = lbd_dram_words) then plf_rdy = 1elsif (fifo_fill_level > 3)then plf_rdy = 1 elsif (fifo_fill_level = =3) then if (lbd_sfu_pladvword = = 1)then plf_rdy = 0 else plf_rdy = 1elsif (fifo_fill_level = = 2) then if (lbd_sfu_pladvword = = 1)thenplf_rdy = 0 else plf_rdy = NOT(sfu_lbd_rdy) else plf_rdy = 0 sfu_lbd_rdy= (plf_rdy AND nlf_rdy)

[3279] 25.8.9 HCUReadLineFIFO Sub-Block TABLE 167 HCUReadLineFIFOAdditional IO Definition Port Name Pins I/O Description DIU and AddressGeneration sub-block Signals hrf_xadvance 1 In Signal from horizontalscaling unit 1 - supply the next dot 1 - supply the current dothrf_hcu_endofline 1 Out Signal lasting 1 cycle indicating then end ofthe HCU read line. hrf_diurreq 1 Out Signal indicating theHCUReadLineFIFO has space for 256-bits of DIU data. hrf_diurack 1 InAcknowledge that read request has been accepted and hrf_diurreq shouldbe de-asserted. hrf_diurdata 1 In Data from HCUReadLineFIFO to DIU.First 64-bits are bits 63:0 of 256 bit word. Second 64-bits are bits127:64 of 256 bit word. Third 64-bits are bits 191:128 of 256 bit word.Fourth 64-bits are bits 255:192 of 256 bit word. hrf_diurvalid 1 InSignal indicating data on hrf_diurdata is valid. hrf_diuidle 1 OutSignal indicating DIU state-machine is in the IDLE state.

[3280] 25.8.9.1 General Description

[3281] The HCUReadLineFIFO sub-block comprises a double 256-bit bufferbetween the HCU and the DIU Interface and Address Generator sub-block.The FIFO is implemented as 8 times 64-bit words. The FIFO is written bythe DIU Interface and Address Generator sub-block and read by the HCU.

[3282] The DIU Interface and Address Generation (DAG) sub-blockinterface of the HCUReadLineFIFO is identical to the LBDPrevLineFIFO DIUinterface.

[3283] Whenever 4 locations in the FIFO are free the FIFO will request256-bits of data from the DAG sub-block by asserting hrf_diurreq. Asignal hrf_diurack indicates that the request has been accepted andhrf_diurreq should be de-asserted.

[3284] The data is written to the FIFO as 64-bits on hrf_diurdata[63:0]over 4 clock cycles. The signal hrf_diurvalid indicates that the datareturned on hrf_diurdata[63:0] is valid. hrf_diurvalid is used togenerate the FIFO write enable, write_en, and to increment the FIFOwrite address, write_adr[2:0]. If the HCUReadLineFIFO still has 256-bitsfree then hrf_diurreq should be asserted again.

[3285] The HCUReadLineFIFO generates a signal sfu_hcu_avail to indicatethat it has data available for the HCU. The HCU reads single-bit datasupplied on sfu_hcu_sdata. The FIFO control logic generates a signal bitselect which selects the next bit of the 64-bit FIFO word to output onsfu_hcu_sdata. The signal hcu_sfu_advdot tells the HCUReadLineFIFO tosupply the next dot (hrf_xadvance=1) or the current dot (hrf_xadvance=0)on sfu_hcu_sdata according to the hrf_xadvance signal from the scalingcontrol unit in the DAG sub-block. The HCU should not generate thehcu_sfu_advdot signal until sfu_hcu_avail is true. The HCU can thereforestall waiting for the sfu_hcu_avail signal.

[3286] When the entire current 64-bit FIFO word has been read by the HCUhcu_sfu_advdot will cause the next word to be popped from the FIFO.

[3287] The last 256-bit word for a line read from DRAM and written intothe HCUReadLineFIFO can contain dots or extra padding which should notbe output to the HCU. A counter in the HCUReadLineFIFO,hcuadvdot_count[15:0], counts the number of hcu_sfu_advdot strobesreceived from the HCU. When the count equals hcu_num_dots[15:0] theHCUReadLineFIFO must adjust the FIFO read address to point to the next256-bit word boundary in the FIFO. This can be achieved by consideringthe FIFO read address, read adr[2:0], will require 3 bits to address 8locations of 64-bits. The next 256-bit aligned address is calculated byinverting the MSB of the read_adr and setting all other bits to 0. If(hcuadvdot_count = = hcu_num_dots) then read_adr[1:0] = b00 read_adr[2]= ˜read_adr[2]

[3288] The DIU Interface and Address Generator sub-block scaling unitalso needs to know when hcuadvdot_count equals hcu_num_dots. Thiscondition is exported from the HCUReadLineFIFO as the signalhrf_hcu_endofline. When the hrf_hcu_endofline is asserted the scalingunit will decide based on vertical scaling whether to go back to thestart of the current line or go onto the next line.

[3289] 25.8.9.2 DRAM Access Limitation

[3290] The SFU must output 1 bit/cycle to the HCU. Since HCUNumDots maynot be a multiple of 256 bits the last 256-bit DRAM word on the line cancontain extra zeros. In this case, the SFU may not be able to provide 1bit/cycle to the HCU. This could lead to a stall by the SFU. This stallcould then propagate if the margins being used by the HCU are notsufficient to hide it. The maximum stall can be estimated by thecalculation: DRAM service period—X scale factor * dots used from lastDRAM read for HCU line.

[3291] 25.8.10 DIU

[3292] Interface and Address Generator Sub-Block TABLE 168 DIU Interfaceand Address Generator Additional IO Description Port name Pins I/ODescription Internal LBDPrevLineFIFO Inputs plf_diurreq 1 In Signalindicating the LBDPrevLineFIFO has 256-bits of data free. plf_diurack 1Out Acknowledge that read request has been accepted and plf_diurreqshould be de-asserted. plf_diurdata 1 Out Data from the DIU toLBDPrevLineFIFO. First 64-bits are bits 63:0 of 256 bit word Second64-bits are bits 127:64 of 256 bit word Third 64-bits are bits 191:128of 256 bit word Fourth 64-bits are bits 255:192 of 256 bit wordplf_diurrvalid 1 Out Signal indicating data on plf_diurdata is valid.plf_diuidle 1 In Signal indicating DIU state- machine is in the IDLEstate. Internal LBDNextLineFIFO Inputs nlf_diuwreq 1 In Signalindicating the LBDNextLineFIFO has 256-bits of data for writing to theDIU. nlf_diuwack 1 Out Acknowledge from DIU that write request has beenaccepted and write data can be output on nlf_diuwdata together withnlf_diuwvalid. nlf_diuwdata 1 In Data from LBDNextLineFIFO to DIUInterface. First 64-bits are bits 63:0 of 256 bit word Second 64-bitsare bits 127:64 of 256 bit word Third 64-bits are bits 191:128 of 256bit word Fourth 64-bits are bits 255:192 of 256 bit word nlf_diuwvalid 1In Signal indicating that data on wlf_diuwdata is valid. InternalHCUReadLineFIFO Inputs hrf_hcu_endofline 1 In Signal lasting 1 cycleindi- cating then end of the HCU read line. hrf_xadvance 1 Out Signalfrom horizontal scaling unit 1 - supply the next dot 1 - supply thecurrent dot hrf_diurreq 1 In Signal indicating the HCUReadLineFIFO hasspace for 256-bits of DIU data. hrf_diurack 1 Out Acknowledge that readrequest has been accepted and hrf_diurreq should be de-asserted.hrf_diurdata 1 Out Data from HCUReadLineFIFO to DIU. First 64-bits arebits 63:0 of 256 bit word Second 64-bits are bits 127:64 of 256 bit wordThird 64-bits are bits 191:128 of 256 bit word Fourth 64-bits are bits255:192 of 256 bit word hrf_diurvalid 1 Out Signal indicating data onplf_diurdata is valid. hrf_diuidle 1 In Signal indicating DIU state-machine is in the IDLE state.

[3293] 25.8.10.1 General Description

[3294] The DIU Interface and Address Generator (DAG) sub-block managesthe bi-level buffer in DRAM. It has a DIU Write Interface for theLBDNextLineFIFO and a DIU Read Interface shared between theHCUReadLineFIFO and LBDPrevLineFIFO.

[3295] All DRAM address management is centralised in the DAG. DRAMaccess is pre-emptive i.e. after a FIFO unit has made an access then assoon as the FIFO has space to read or data to write a DIU access will berequested immediately. This ensures there are no unnecessary stallsintroduced e.g. at the end of an LBD or HCU line.

[3296] The control logic for horizontal and vertical non-integer scalinglogic is completely contained in the DAG sub-block. The scaling controlunit exports the hlf_xadvance signal to the HCUReadLineFIFO whichindicates whether to replicate the current dot or supply the next dotfor horizontal scaling.

[3297] 25.8.10.2 DIU Write Interface

[3298] The LBDNextLineFIFO generates all the DIU write interface signalsdirectly except for sfu_diu_wadr[21:5] which is generated by the AddressGeneration logic The DIU request from the LBDNextLineFIFO will benegated if its respective address pointer in DRAM is invalid i.e.nlf_adrvalid=0. The implementation must ensure that no erroneousrequests occur on sfu_diu_wreq.

[3299] 25.8.10.3 DIU Read Interface

[3300] Both HCUReadLineFIFO and LBDPrevLineFIFO share the readinterface. If both sources request simultaneously, then the arbitrationlogic implements a round-robin sharing of read accesses between theHCUReadLineFIFO and LBDPrevLineFIFO.

[3301] The DIU read request arbitration logic generates a signal,select_hrfplf, which indicates whether the DIU access is from theHCUReadLineFIFO or LBDPrevLineFIFO (0=HCUReadLineFIFO,1=LBDPrevLineFIFO). FIG. 171 shows select_hrfplf multiplexing thereturned DIU acknowledge and read data to either the HCUReadLineFIFO orLBDPrevLineFIFO.

[3302] The DIU read request arbitration logic is shown in FIG. 172. Thearbitration logic will select a DIU read request on hrf_diurreq orplf_diurreq and assert sfu_diu_rreq which goes to the DIU. Theaccompanying DIU read address is generated by the Address GenerationLogic. The select signal select_hrfplf will be set according to thearbitration winner (0=HCUReadLineFIFO, 1=LBDPrevLineFIFO). sfu_diu_rreqis cleared when the DIU acknowledges the request on diu_sfu_rack.Arbitration cannot take place again until the DIU state-machine of thearbitration winner is in the idle state, indicated by diu_idle. This isnecessary to ensure that the DIU read data is multiplexed back to theFIFO that requested it.

[3303] The DIU read requests from the HCUReadLineFIFO andLBDPrevLineFIFO will be negated if their respective addresses in DRAMare invalID, hrf_adrvalid=0 or plf_adrvalid=0. The implementation mustensure that no erroneous requests occur on sfu_diu_rreq.

[3304] If the HCUReadLineFIFO and LBDPrevLineFIFO requestsimultaneously, then if the request is not following immediately anotherDIU read port access, the arbitration logic will choose theHCUReadLineFIFO by default. If there are back to back requests to theDIU read port then the arbitration logic implements a round-robinsharing of read accesses between the HCUReadLineFIFO andLBDPrevLineFIFO.

[3305] A pseudo-code description of the DIU read arbitration is givenbelow. // history is of type {none, hrf, plf}, hrf is HCUReadLineFIFO,plf is LBDPrevLineFIFO // initialisation on reset select_hrfplf = 0 //default choose hrf history = none // no DIU read access immediatelypreceding // state-machine is busy between asserting sfu_diu_rreq anddiu_idle = 1 // if DIU read requester state-machine is in idle statethen de-assert busy if (diu_idle = = 1) then busy = 0 //if acknowledgereceived from DIU then de-assert DIU request if (diu_sfu_rack = = 1)then //de-assert request in response to acknowledge sfu_diu_rreq = 0 //if not busy then arbitrate between incoming requests // if requestdetected then assert busy if (busy = = 0) then //if there is no requestif (hrf_diurreq = = 0) AND (plf_diurreq = = 0) then sfu_diu_rreq = 0history = none // else there is a request else { // assert busy andrequest DIU read access busy = 1 sfu_diu_rreq = 1// arbitrate in round-robin fashion between the requestors// if only HCUReadLineFIFO requesting choose HCUReadLineFIFO if(hrf_diurreq = = 1) AND (plf_diurreq = = 0) then history = hrfselect_hrfplf = 0 // if only LBDPrevLineFIFO requesting chooseLBDPrevLineFIFO if (hrf_diurreq = = 0) AND (plf_diurreq = = 1) thenhistory = plf select_hrfplf = 1//if both HCUReadLineFIFO and LBDPrevLineFIFO requesting if (hrf_diurreq= = 1) AND (plf_diurreq = = 1) then// no immediately preceding request choose HCUReadLineFIFO if (history= = none) then history = hrf select_hrfplf = 0 // if previous winner wasHCUReadLineFIFO choose LBDPrevLineFIFO elsif (history = = hrf) thenhistory = plf select_hrfplf = 1 // if previous winner wasLBDPrevLineFIFO choose HCUReadLineFIFO elsif (history = = plf) thenhistory = hrf select_hrfplf = 0 // end there is a request }

[3306] 25.8.10.4 Address Generation Logic

[3307] The DIU interface generates the DRAM addresses of data read andwritten by the SFU's FIFOs.

[3308] A write request from the LBDNextLineFIFO on nlf_diuwreq causes awrite request from the DIU Write Interface. The Address Generatorsupplies the DRAM write address on sfu_diu_wadr[21:5]A winning readrequest from the DIU read request arbitration logic causes a readrequest from the DIU Read Interface. The Address Generator supplies theDRAM read address on sfu_diu_radr[21:5].

[3309] The address generator is configured with the number of DRAM wordsto read in a HCU line, hcu_dram_words, the first DRAM address of the SFUarea, start_sfu_adr[21:5], and the last DRAM address of the SFU area,end_sfu_adr[21:5].

[3310] Note hcu_dram_words configuration register specifies the thenumber of DRAM words consumed per line in the HCU, while lbd_dram_wordsspecifies the number of DRAM words generated per line by the LBD. Thesevalues are not required to be the same.

[3311] For example the LBD may store 10 DRAM words per line(lbd_dram_words=10), but the HCU may consume 5 DRAM words per line. Insuch case the hcu_dram_words would be set to 5 and the HCU Read LineFIFO would trigger a new line after it had consumed 5 DRAM words (viahrf_hcu_endofline).

[3312] Address Generation

[3313] There are four address pointers used to manage the bi-level DRAMbuffer:

[3314] a. hcu_readline_rd_adr is the read address in DRAM for theHCUReadLineFIFO.

[3315] b. hcu_startreadline_adr is the start address in DRAM for thecurrent line being read by the HCUReadLineFIFO.

[3316] c. lbd_nextline_wr_adr is the write address in DRAM for theLBDNextLineFIFO.

[3317] d. lbd_prevline_rd_adr is the read address in DRAM for theLBDPrevLineFIFO.

[3318] The current value of these address pointers are readable by theCPU.

[3319] Four corresponding address valid flags are required to indicatewhether the address pointers are valID, based on whether the FIFOs arefull or empty.

[3320] a. hlf_adrvalID, derived from hrf_nlf_fifo_emp

[3321] b. hlf_start_adrvalID, derived from start_hrd_nlf_fifo_emp

[3322] c. nlf_adrvalid. derived from nlf_plf_fifo_full andnlf_hrf_fifo_full

[3323] d. plf_adrvalid. derived from plf_nlf_fifo_emp

[3324] DRAM requests from the FIFOs will not be issued to the DIU untilthe appropriate address flag is valid.

[3325] Once a request has been acknowledged, the address generationlogic can calculate the address of the next 256-bit word in DRAM, readyfor the next request.

[3326] Rules for Address Pointers

[3327] The address pointers must obey certain rules which indicatewhether they are valid:

[3328] a. hcu_readline_rd_adr is only valid if it is reading earlier inthe line than lbd_nextline_wr_adr is writing i.e. the fifo is not empty

[3329] b. The SFU (lbd_nextline_wr_adr) cannot overwrite the currentline that the HCU is reading from (hcu_startreadline_adr) i.e. the fifois not full, when compared with the HCU read line pointer

[3330] c. The LBDNextLineFIFO (lbd_nextline_wr_adr) must be writingearlier in the line than LBD-PrevLineFIFO (lbd_prevline_rd_adr) isreading and must not overwrite the current line that the HCU is readingfrom i.e. the fifo is not full when compared to the PrevLineFifo readpointer

[3331] d. The LBDPrevLineFIFO (lbd_prevline_rd_adr) can read right up tothe address that LBDNextLineFIFO (lbd_nextline_wr_adr) is writing i.ethe fifo is not empty.

[3332] e. At startup i.e. when sfu_go is asserted, the pointers arereset to start_sfu_adr[21:5].

[3333] f. The address pointers can wrap around the SFU bi-level storearea in DRAM.

[3334] Address Generator Pseudo-Code:

[3335] Initialization: if (sfu_go rising edge) then//initialise address pointers to start of SFU address spacelbd_prevline_rd_adr = start_sfu_adr[21:5] lbd_nextline_wr_adr =start_sfu_adr[21:5] hcu_readline_rd_adr = start_sfu_adr[21:5]hcu_startreadline_adr = start_sfu_adr[21:5] lbd_nextline_wr_wrap = 0lbd_prevline_rd_wrap = 0 hcu_startreadline_wrap = 0 hcu_readline_rd_wrap= 0 }

[3336] Determine FIFO fill and emplty status: // calculate which FIFOsare full and empty plf_nlf_fifo_emp = (lbd_prevline_rd_adr = =lbd_nextline_wr_adr) AND (lbd_prevline_rd_wrap = = lbd_nextline_wr_wrap)nlf_plf_fifo_full = (lbd_nextline_wr_adr = = lbd_prevline_rd_adr) AND(lbd_prevline_rd_wrap != lbd_nextline_wr_wrap) nlf_hrf_fifo_full =(lbd_nextline_wr_adr = = hcu_startreadline_adr) AND(hcu_startreadline_wrap != lbd_nextline_wr_wrap)// hcu start address can jump addresses and so needs comparitor if(hcu_startreadline_wrap = = lbd_nextline_wr_wrap) thenstart_hrf_nlf_fifo_emp = (hcu_startreadline_adr >=lbd_nextline_wr_adr)else start_hrf_nlf_fifo_emp= NOT(hcu_startreadline_adr >=lbd_nextline_wr_adr)// hcu read address can jump addresses and so needs comparitor if(hcu_readline_rd_wrap = = lbd_nextline_wr_wrap) then hrf_nlf_fifo_emp =(hcu_readline_rd_adr >=lbd_nextline_wr_adr) else hrf_nlf_fifo_emp =NOT(hcu_readline_rd_adr >=lbd_nextline_wr_adr)

[3337] Address pointer updating: // LBD Next line FIFO // if DIU writeacknowledge and LBDNextLineFIFO is not full with reference to PLF andHRF if (diu_sfu_wack = = 1 AND nlf_plf_fifo_full != 1 ANDnlf_hrf_fifo_full !=1 ) then if (lbd_nextline_wr_adr = = end sfu adr)then // if end of SFU address range lbd_nextline_wr_adr = start_sfu_adr// go to start of SFU address range lbd_nextline_wr_wrap= NOT(lbd_nextline_wr_wrap) // invert the wrap bit else lbd_nextline_wr_adr++// increment address pointer // LBD PrevLine FIFO //if DIU readacknowledge and LBDPrevLineFIFO is not emptyif (diu_sfu_rack = = 1 AND select_hrfplf = = 1 AND plf_nlf_fifo_emp !=1)then if (lbd_prevline_rd_adr = = end_sfu_adr) then lbd_prevline_rd_adr =start_sfu_adr // go to start of SFU address range lbd_prevline_rd_wrap=NOT (lbd_prevline_rd_wrap) // invert the wrap bit elselbd_prevline_rd_adr++ // increment address pointer // HCU ReadLine FIFO// if DIU read acknowledge and HCUReadLineFIFO fifo is not emptyif (diu_sfu_rack = = 1 AND select_hrfplf = = 0 AND hrf_nlf_fifo_emp!= 1) then // going to update hcu read line address if(hrf_hcu_endofline = = 1) AND (hrf_yadvance = = 1) then { // read thenext line from DRAM // advance to start of next HCU line in DRAMhcu_startreadline_adr = hcu_startreadline_adr + lbd_dram_wordsoffset = hcu_startreadline_adr − end_sfu_adr − 1 // allow for addresswraparound if (offset >= 0) then hcu_startreadline_adr = start_sfu_adr +offset hcu_startreadline_wrap= NOT(hcu_startreadline_wrap)hcu_readline_rd_adr = hcu_startreadline_adr hcu_readline_rd_wrap=hcu_startreadline_wrap } elsif (hrf_hcu_endofline = = 1) AND(hrf_yadvance = = 0) then hcu_readline_rd_adr = hcu_startreadline_adr //restart and re-use the same line hcu_readline_rd_wrap=hcu_startreadline_wrap elsif (hcu readline rd adr = = end sfu adr) then// check if the FIFO needs to wrap space hcu_readline_rd_adr =start_sfu_adr // go to start of SFU address space hcu_readline_rd_wrap=NOT (hcu_readline_rd_wrap) else hcu_readline_rd_adr ++ // incrementaddress pointer

[3338] 25.8.10.4.1 X Scaling of Data for HCUReadLineFIFO

[3339] The signal hcu_sfu_advdot tells the HCUReadLineFIFO to supply thenext dot or the current dot on sfu_hcu_sdata according to thehrf_xadvance signal from the scaling control unit. When hrf_xadvance is1 the HCUReadLineFIFO should supply the next dot. When hrf_xadvance is 0the HCUReadLineFIFO should supply the current dot.

[3340] The algorithm for non-integer scaling is described in thepseudocode below. Note, x_scale_count should be loaded withx_start_count after reset and at the end of each line. The end of theline is indicated by hrf_hcu_endofline from the HCUReadLineFIFO. if(hcu_sfu_advdot = = 1) then if (x_scale_count + x_scale_denom −x_scale_num >= 0) then x_scale_count = x_scale_count + x_scale_denom −x_scale_num hrf_xadvance = 1 else x_scale_count = x_scale_count +x_scale_denom hrf_xadvance = 0 else x_scale_count = x_scale_counthrf_xadvance = 0

[3341] 25.8.10.4.2 Y Scaling of Data for HCUReadLineFIFO

[3342] The HCUReadLineFIFO counts the number of hcu_sfu_advdot strobesreceived from the HCU. When the count equals hcu_num_dots theHCUReadLineFIFO will assert hrf_hcu_endofline for a cycle.

[3343] The algorithm for non-integer scaling is described in thepseudocode below. Note, y_scale_count should be loaded with zero afterreset. if (hrf_hcu_endofline = = 1) then if (y_scale_count +y_scale_denom − y_scale_num >= 0) then y_scale_count = y_scale_count +y_scale_denom − y_scale_num hrf_yadvance = 1 else y_scale_count =y_scale_count + y_scale_denom hrf_yadvance = 0 else y_scale_count =y_scale_count hrf_yadvance = 0

[3344] When the hrf_hcu_endofline is asserted the Y scaling unit willdecide whether to go back to the start of the current line, by settinghrf_yadvance=0, or go onto the next line, by setting hrf_yadvance=1.

[3345]FIG. 176 shows an overview of X and Y scaling for HCU data.

[3346] 26 Tag Encoder (TE)

[3347] 26.1 Overview

[3348] The Tag Encoder (TE) provides functionality for Netpage-enabledapplications, and typically requires the presence of IR ink (although Kink can be used for tags in limited circumstances). The TE encodes fixeddata for the page being printed, together with specific tag data valuesinto an error-correctable encoded tag which is subsequently printed ininfrared or black ink on the page. The TE places tags on a triangulargrID, and can be programmed for both landscape and portraitorientations.

[3349] Basic tag structures are normally rendered at 1600 dpi, while tagdata is encoded into an arbitrary number of printed dots. The TEsupports integer scaling in the Y-direction while the TFU supportsinteger scaling in the X-direction. Thus, the TE can render tags atresolutions less than 1600 dpi which can be subsequently scaled up to1600 dpi.

[3350] The output from the TE is buffered in the Tag FIFO Unit (TFU)which is in turn used as input by the HCU. In addition, ate_finishedband signal is output to the end of band unit once the inputtag data has been loaded from DRAM. The high level data path is shown bythe block diagram in FIG. 177.

[3351] After passing through the HCU, the tag plane is subsequentlyprinted with an infrared-absorptive ink that can be read by a Netpagesensing device. Since black ink can be IR absorptive, limitedfunctionality can be provided on offset-printed pages using black ink onotherwise blank areas of the page—for example to encode buttons.Alternatively an invisible infrared ink can be used to print theposition tags over the top of a regular page. However, if invisible IRink is used, care must be taken to ensure that any other printedinformation on the page is printed in infrared-transparent CMY ink, asblack ink will obscure the infrared tags. The monochromatic scheme waschosen to maximize dynamic range in blurry reading environments.

[3352] When multiple SoPEC chips are used for printing the same side ofa page, it is possible that a single tag will be produced by two SoPECchips. This implies that the TE must be able to print partial tags.

[3353] The throughput requirement for the SoPEC TE is to produce tags athalf the rate of the PEC1 TE.

[3354] Since the TE is reused from PEC1, the SoPEC TE over-produces by afactor of 2.

[3355] In PEC1, in order to keep up with the HCU which processes 2 dotsper cycle, the tag data interface has been designed to be capable ofencoding a tag in 63 cycles. This is actually accomplished inapproximately 52 cycles within PEC1. If the SoPEC TE were to be modifiedfrom two dots production per cycle to a nominal one dot per cycle itshould not lose the 63/52 cycle performance edge attained in the PEC1TE.

[3356] 26.2 What are Tags?

[3357] The first barcode was described in the late 1940's by Woodlandand Silver, and finally patented in 1952 (U.S. Pat. No. 2,612,994) whenelectronic parts were scarce and very expensive. Now however, with theadvent of cheap and readily available computer technology, nearly everyitem purchased from a shop contains a barcode of some description on thepackaging. From books to CDs, to grocery items, the barcode provides aconvenient way of identifying an object by a product number. The exactinterpretation of the product number depends on the type of barcode.Warehouse inventory tracking systems let users define their own productnumber ranges, while inventory in shops must be more universally encodedso that products from one company don't overlap with products fromanother company. Universal Product Codes (UPC) were introduced in themid 1970's at the request of the National Association of Food Chains forthis very reason. Barcodes themselves have been specified in a largenumber of formats. The older barcode formats contain characters that aredisplayed in the form of lines. The combination of black and white linesdescribe the information the barcodes contains. Often there are twotypes of lines to form the complete barcode: the characters (theinformation itself) and lines to separate blocks for better opticalrecognition. While the information may change from barcode to barcode,the lines to separate blocks stays constant. The lines to separateblocks can therefore be thought of as part of the constant structuralcomponents of the barcode.

[3358] Barcodes are read with specialized reading devices that then passthe extracted data onto the computer for further processing. Forexample, a point-of-sale scanning device allows the sales assistant toadd the scanned item to the current sale, places the name of the itemand the price on a display device for verification etc. Light-pens, gunreaders, scanners, slot readers, and cameras are among the many devicesused to read the barcodes.

[3359] To help ensure that the data extracted was read correctly,checksums were introduced as a crude form of error detection. Morerecent barcode formats, such as the Aztec 2D barcode developed by AndyLongacre in 1995 (U.S. Pat. No. 5,591,956), but now released to thepublic domain, use redundancy encoding schemes such as Reed-Solomon.Reed Solomon encoding is adequately discussed in [28], [30] and [34].The reader is advised to refer to these sources for backgroundinformation. Very often the degree of redundancy encoding is userselectable.

[3360] More recently there has also been a move from the simple onedimensional barcodes (line based) to two dimensional barcodes. Insteadof storing the information as a series of lines, where the data can beextracted from a single dimension, the information is encoded in twodimensions. Just as with the original barcodes, the 2D barcode containsboth information and structural components for better opticalrecognition. FIG. 178 shows an example of a QR Code (Quick ResponseCode), developed by Denso of Japan (U.S. Pat. No. 5,726,435). Note thebarcode cell is comprised of two areas: a data area (depends on the databeing stored in the barcode), and a constant position detection pattern.The constant position detection pattern is used by the reader to helplocate the cell itself, then to locate the cell boundaries, to allow thereader to determine the original orientation of the cell (orientationcan be determined by the fact that there is no 4th corner pattern).

[3361] The number of barcode encoding schemes grows daily. Yet veryoften the hardware for producing these barcodes is specific to theparticular barcode format. As printers become more and more embedded,there is an increasing desire for real-time printing of these barcodes.In particular, Netpage enabled applications require the printing of 2Dbarcodes (or tags) over the page, preferably in infra-red ink. The tagencoder in SoPEC uses a generic barcode format encoding scheme which isparticularly suited to real-time printing. Since the barcode encodingformat is generic, the same rendering hardware engine can be used toproduce a wide variety of barcode formats.

[3362] Unfortunately the term “barcode” is interpreted in different waysby different people. Sometimes it refers only to the data areacomponent, and does not include the constant position detection pattern.In other cases it refers to both data and constant position detectionpattern.

[3363] We therefore use the term tag to refer to the combination of dataand any other components (such as position detection pattern, blankspace etc. surround) that must be rendered to help hold or locate/readthe data. A tag therefore contains the following components:

[3364] data area(s). The data area is the whole reason that the tagexists. The tag data area(s) contains the encoded data (optionallyredundancy-encoded, perhaps simply checksummed) where the bits of thedata are placed within the data area at locations specified by the tagencoding scheme.

[3365] constant background patterns, which typically includes a constantposition detection pattern. These help the tag reader to locate the tag.They include components that are easy to locate and may containorientation and perspective information in the case of 2D tags. Constantbackground patterns may also include such patterns as a blank areasurrounding the data area or position detection pattern. These blankpatterns can aid in the decoding of the data by ensuring that there isno interference between tags or data areas.

[3366] In most tag encoding schemes there is at least some constantbackground pattern, but it is not necessarily required by all. Forexample, if the tag data area is enclosed by a physical space and thereading means uses a non-optical location mechanism (e.g. physicalalignment of surface to data reader) then a position detection patternis not required.

[3367] Different tag encoding schemes have different sized tags, andhave different allocation of physical tag area to constant positiondetection pattern and data area. For example, the QR code has 3 fixedblocks at the edges of the tag for position detection pattern (see FIG.178) and a data area in the remainder. By contrast, the Netpage tagstructure (see FIGS. 179 and 180) contains a circular locator component,an orientation feature, and several data areas. FIG. 179(a) shows theNetpage tag constant background pattern in a resolution independentform. FIG. 179(b) is the same as FIG. 179(a), but with the addition ofthe data areas to the Netpage tag. FIG. 180 is an example of dotplacement and rendering to 1600 dpi for a Netpage tag. Note that in FIG.180 a single bit of data is represented by many physical output dots toform a block within the data area.

[3368] 26.2.1 Contents of the Data Area

[3369] The data area contains the data for the tag.

[3370] Depending on the tag's encoding format, a single bit of data maybe represented by a number of physical printed dots. The exact number ofdots will depend on the output resolution and the targetreading/scanning resolution. For example, in the QR code (see FIG. 178),a single bit is represented by a dark module or a light module, wherethe exact number of dots in the dark module or light module depends onthe rendering resolution and target reading/scanning resolution. Forexample, a dark module may be represented by a square block of printeddots (all on for binary 1, or all off for binary 0), as shown in FIG.181.

[3371] The point to note here is that a single bit of data may berepresented in the printed tag by an arbitrary printed shape. Thesmallest shape is a single printed dot, while the largest shape istheoretically the whole tag itself, for example a giant macrodotcomprised of many printed dots in both dimensions.

[3372] An ideal generic tag definition structure allows the generationof an arbitrary printed shape from each bit of data.

[3373] 26.2.2 What do the Bits Represent?

[3374] Given an original number of bits of data, and the desire to placethose bits into a printed tag for subsequent retrieval via areading/scanning mechanism, the original number of bits can either beplaced directly into the tag, or they can be redundancy-encoded in someway. The exact form of redundancy encoding will depend on the tagformat.

[3375] The placement of data bits within the data area of the tag isdirectly related to the redundancy mechanism employed in the encodingscheme. The idea is generally to place data bits together in 2D so thatburst errors are averaged out over the tag data, thus typically beingcorrectable. For example, all the bits of Reed-Solomon codeword would bespread out over the entire tag data area so to minimize being affectedby a burst error.

[3376] Since the data encoding scheme and shape and size of the tag dataarea are closely linked, it is desirable to have a generic tag formatstructure. This allows the same data structure and rendering embodimentto be used to render a variety of tag formats.

[3377] 26.2.2.1 Fixed and Variable Data Components

[3378] In many cases, the tag data can be reasonably divided into fixedand variable components. For example, if a tag holds N bits of data,some of these bits may be fixed for all tags while some may vary fromtag to tag.

[3379] For example, the Universal product code allows a country code anda company code. Since these bits don't change from tag to tag, thesebits can be defined as fixed, and don't need to be provided to the tagencoder each time, thereby reducing the bandwidth when producing manytags.

[3380] Another example is Netpage tags. A single printed page contains anumber of Netpage tags. The page-id will be constant across all thetags, even though the remainder of the data within each tag may bedifferent for each tag. By reducing the amount of variable data beingpassed to SoPEC's tag encoder for each tag, the overall bandwidth can bereduced.

[3381] Depending on the embodiment of the tag encoder, these parameterswill be either implicit or explicit, and may limit the size of tagsrenderable by the system. For example, a software tag encoder may becompletely variable, while a hardware tag encoder such as SoPEC's tagencoder may have a maximum number of tag data bits.

[3382] 26.2.2.2 Redundancy-Encode the Tag Data within the Tag Encoder

[3383] Instead of accepting the complete number of TagData bits encodedby an external encoder, the tag encoder accepts the basicnon-redundancy-encoded data bits and encodes them as required for eachtag. This leads to significant savings of bandwidth and on-chip storage.

[3384] In SoPEC's case for Netpage tags, only 120 bits of original dataare provided per tag, and the tag encoder encodes these 120 bits into360 bits. By having the redundancy encoder on board the tag encoder theeffective bandwidth and internal storage required is reduced to only 33%of what would be required if the encoded data was read directly.

[3385] 26.3 Placement of Tags on a Page

[3386] The TE places tags on the page in a triangular grid arrangementas shown in FIG. 182.

[3387] The triangular mesh of tags combined with the restriction of nooverlap of columns or rows of tags means that the process of tagplacement is greatly simplified. For a given line of dots, all the tagson that line correspond to the same part of the general tag structure.The triangular placement can be considered as alternative lines of tags,where one line of tags is inset by one amount in the dot dimension, andthe other line of dots is inset by a different amount. The dot inter-taggap is the same in both lines of tag, and is different from the lineinter-tag gap.

[3388] Note also that as long as the tags themselves can be rotated,portrait and landscape printing are essentially the same—the placementparameters of line and dot are swapped, but the placement mechanism isthe same.

[3389] The general case for placement of tags therefore relies on anumber of parameters, as shown in FIG. 183.

[3390] The parameters are more formally described in Table 169. Notethat these are placement parameters and not registers. TABLE 169 Tagplacement parameters parameter description restrictions Tag height Thenumber of dot lines in minimum 1 a tag's bounding box Tag width Thenumber of dots in a minimum 1 single line of the tag's bounding box. Thenumber of dots in the tag itself may vary depending on the shape of thetag, but the number of dots in the bounding box will be constant (bydefinition). Dot inter- The number of dots from the minimum = 0 tag gapedge of one tag's bounding box to the start of the next tag's boundingbox, in the dot direction. Line inter- The number of dot lines minimum =0 tag gap from the edge of one tag's bounding box to the start of thenext tag's bounding box, in the line direction. Start Defines the statusof the — Position top left dot on the page - is an offset in dot & rowwithin the tag or the inter-tag gap. AltTagLinePosition Defines thestatus for the — start of the alternate row of tags. Is an offset in dotwithin the tag or within the dot inter-tag gap (the row position isalways 0).

[3391] 26.4 Basic Tag Encoding Parameters

[3392] SoPEC's tag encoder imposes range restrictions on tag encodingparameters as a direct result of on-chip buffer sizes. Table 170 liststhe basic encoding parameters as well as range restrictions whereappropriate. Although the restrictions were chosen to take the mostlikely encoding scenarIOs into account, it is a simple matter to adjustthe buffer sizes and corresponding addressing to allow arbitraryencoding parameters in future implementations. TABLE 170 Encodingparameters name definition maximum value imposed by TE W page width 2¹⁴dotpairs or 20.48 inches at 1600 dpi S tag size typical tag size is 2 mm× 2 mm maximum tag size is 384 dots × 384 dots before scaling i.e. 6 mm× 6 mm at 1600 dpi N number of dots in each 384 dots before scalingdimension of the tag E redundancy encoding for Reed-Solomon GF(2⁴) attag data 5:10 or 7:8 D_(F) size of fixed data 40 or 56 bits (unencoded)R_(F) size of redundancy- 120 bits encoded fixed data D_(V) size ofvariable data 120 or 112 bits (unencoded) R_(V) size of redundancy- 360or 240 bits encoded variable data T tags per page width 256

[3393] The fixed data for the tags on a page need only be supplied tothe TE once. It can be supplied as 40 or 56 bits of unencoded data andencoded within the TE as described in Section 26.4.1. Alternatively itcan be supplied as 120 bits of pre-encoded data (encoded arbitrarily).

[3394] The variable data for the tags on a page are those 112 or 120data bits that are variable for each tag. Variable tag data is suppliedas part of the band data, and is always encoded by the TE as describedin Section 26.4.1, but may itself be arbitrarily pre-encoded.

[3395] 26.4.1 Redundancy Encoding

[3396] The mapping of data bits (both fixed and variable) to redundancyencoded bits relies heavily on the method of redundancy encodingemployed. Reed-Solomon encoding was chosen for its ability to deal withburst errors and effectively detect and correct errors using a minimumof redundancy. Reed Solomon encoding is adequately discussed in [28],[30] and [34]. The reader is advised to refer to these sources forbackground information.

[3397] In this implementation of the TE we use Reed-Solomon encodingover the Galois Field GF(2⁴). Symbol size is 4 bits. Each codewordcontains 15 4-bit symbols for a codeword length of 60 bits.

[3398] The primitive polynomial is p(x)=x⁴+x+1, and the generatorpolynomial is g(x)=(x+α)(x+α²) . . . (x+α^(2t)), where t=the number ofsymbols that can be corrected.

[3399] Of the 15 symbols, there are two possibilities for encoding:

[3400] RS(15, 5): 5 symbols original data (20 bits), and 10 redundancysymbols (40 bits). The 10 redundancy symbols mean that we can correct upto 5 symbols in error. The generator polynomial is thereforeg(x)=(x+α)(x+α²) . . . (x+α¹⁰).

[3401] RS(15, 7): 7 symbols original data (28 bits), and 8 redundancysymbols (32 bits). The 8 redundancy symbols mean that we can correct upto 4 symbols in error. The generator polynomial is g(x)=(x+α)(x+α²) . .. (x+α⁸).

[3402] In the first case, with 5 symbols of original data, the totalamount of original data per tag is 160 bits (40 fixed, 120 variable).This is redundancy encoded to give a total amount of 480 bits (120fixed, 360 variable) as follows:

[3403] Each tag contains up to 40 bits of fixed original data. Therefore2 codewords are required for the fixed data, giving a total encoded datasize of 120 bits. Note that this fixed data only needs to be encodedonce per page.

[3404] Each tag contains up to 120 bits of variable original data.Therefore 6 codewords are required for the variable data, giving a totalencoded data size of 360 bits.

[3405] In the second case, with 7 symbols of original data, the totalamount of original data per tag is 168 bits (56 fixed, 112 variable).This is redundancy encoded to give a total amount of 360 bits (120fixed, 240 variable) as follows:

[3406] Each tag contains up to 56 bits of fixed original data. Therefore2 codewords are required for the fixed data, giving a total encoded datasize of 120 bits. Note that this fixed data only needs to be encodedonce per page.

[3407] Each tag contains up to 112 bits of variable original data.Therefore 4 codewords are required for the variable data, giving a totalencoded data size of 240 bits.

[3408] The choice of data to redundancy ratio depends on theapplication.

[3409] 26.5 Data Structures Used by Tag Encoder

[3410] 26.5.1 Tag Format Structure

[3411] The Tag Format Structure (TFS) is the template used to rendertags, optimized so that the tag can be rendered in real time. The TFScontains an entry for each dot position within the tag's bounding box.Each entry specifies whether the dot is part of the constant backgroundpattern or part of the tag's data component (both fixed and variable).

[3412] The TFS is very similar to a bitmap in that it contains one entryfor each dot position of the tag's bounding box. The TFS therefore hasTagHeight×TagWidth entries, where TagHeight matches the height of thebounding box for the tag in the line dimension, and TagWidth matches thewidth of the bounding box for the tag in the dot dimension. A singleline of TFS entries for a tag is known as a tag line structure.

[3413] The TFS consists of TagHeight number of tag line structures, onefor each 1600 dpi line in the tag's bounding box. Each tag linestructure contains three contiguous tables, known as tables A, B, and C.Table A contains 384 2-bit entries, one entry for each of the maximumnumber of dots in a single line of a tag (see Table). The actual numberof entries used should match the size of the bounding box for the tag inthe dot dimension, but all 384 entries must be present. Table B contains32 9-bit data addresses that refer to (in order of appearance) the datadots present in the particular line. All 32 entries must be present,even if fewer are used. Table C contains two 5-bit pointers into tableB, and therefore comprises 10 bits. Padding of 214 bits is added. Thetotal length of each tag line structure is therefore 5×256-bit DRAMwords. Thus a TFS containing TagHeight tag line structures requires aTagHeight*160 bytes. The structure of a TFS is shown in FIG. 184.

[3414] A full description of the interpretation and usage of Tables A, Band C is given in section 26.8.3 on page 444.

[3415] 26.5.1.1 Scaling a Tag

[3416] If the size of the printed dots is too small, then the tag can bescaled in one of several ways. Either the tag itself can be scaled by Ndots in each dimension, which increases the number of entries in theTFS. As an alternative, the output from the TE can be scaled up by pixelreplication via a scale factor greater than 1 in the both the TE andTFU.

[3417] For example, if the original TFS was 21×21 entries, and thescaling were a simple 2×2 dots for each of the original dots, we couldincrease the TFS to be 42×42. To generate the new TFS from the old, wewould repeat each entry across each line of the TFS, and then we wouldrepeat each line of the TFS. The net number of entries in the TFS wouldbe increased fourfold (2×2).

[3418] The TFS allows the creation of macrodots instead of simplescaling. Looking at FIG. 185 for a simple example of a 3×3 dot tag, wemay want to produce a physically large printed form of the tag, whereeach of the original dots was represented by 7×7 printed dots. If wesimply performed replication by 7 in each dimension of the original TFS,either by increasing the size of the TFS by 7 in each dimension orputting a scale-up on the output of the tag generator output, then wewould have 9 sets of 7×7 square blocks. Instead, we can replace each ofthe original dots in the TFS by a 7×7 dot definition of a rounded dot.FIG. 186 shows the results.

[3419] Consequently, the higher the resolution of the TFS the moreprinted dots can be printed for each macrodot, where a macrodotrepresents a single data bit of the tag. The more dots that areavailable to produce a macrodot, the more complex the pattern of themacrodot can be. As an example, Figure n page 461 on page Error!Bookmark not defined. shows the Netpage tag structure rendered such thatthe data bits are represented by an average of 8 dots×8 dots (at 1600dpi), but the actual shape structure of a dot is not square. This allowsthe printed Netpage tag to be subsequently read at any orientation.

[3420] 26.5.2 Raw Tag Data

[3421] The TE requires a band of unencoded variable tag data if variabledata is to be included in the tag bit-plane. A band of unencodedvariable tag data is a set of contiguous unencoded tag data records, inorder of encounter top left of printed band from top left to lowerright.

[3422] An unencoded tag data record is 128 bits arranged as follows:bits 0-111 or 0-119 are the bits of raw tag data, bit 120 is a flag usedby the TE (TagIsPrinted), and the remaining 7 bits are reserved (andshould be 0). Having a record size of 128 bits simplifies the tag dataaccess since the data of two tags fits into a 256-bit DRAM word. It alsomeans that the flags can be stored apart from the tag data, thus keepingthe raw tag data completely unrestricted. If there is an odd number oftags in line then the last DRAM read will contain a tag in the first 128bits and padding in the final 128 bits.

[3423] The TagIsPrinted flag allows the effective specification of a tagresolution mask over the page. For each tag position the TagIsPrintedflag determines whether any of the tag is printed or not. This allowsarbitrary placement of tags on the page. For example, tags may only beprinted over particular active areas of a page. The TagIsPrinted flagallows only those tags to be printed. TagIsPrinted is a 1 bit flag withvalues as shown in Table 171. TABLE 171 TaglsPrinted values Valuedescription 0 Don't print the tag in this tag position. Output 0 foreach dot within the tag bounding box. 1 Print the tag as specified bythe various tag structures.

[3424] 26.5.3 DRAM Storage Requirements

[3425] The total DRAM storage required by a single band of raw tag datadepends on the number of tags present in that band. Each tag requires128 bits. Consequently if there are N tags in the band, the size in DRAMis 16N bytes.

[3426] The maximum size of a line of tags is 163×128 bits. Whenmaximally packed, a row of tags contains 163 tags (see Table) andextends over a minimum of 126 print lines. This equates to 282 KBytesover a Letter page.

[3427] The total DRAM storage required by a single TFS is TagHeight/7KBytes (including padding). Since the likely maximum value for TagHeightis 384 (given that SoPEC restricts TagWidth to 384), the maximum size inDRAM for a TFS is 55 KBytes.

[3428] 26.5.4 DRAM Access Requirements

[3429] The TE has two separate read interfaces to DRAM for raw tag data,TD, and tag format structure, TFS.

[3430] The memory usage requirements are shown in Table 172. Raw tagdata is stored in the compressed page store TABLE 172 Memory usagerequirements Block Size Description Compressed 2048 Kbytes Compresseddata page store for page Bi-level, contone and raw tag store data. TagFormat 55 Kbyte (384 55 kB in PEC1 for 384 dot line Structure dot linetags (the benchmark) at 1600 dpi tags @ 2.5 mm tags 1600 dpi) (1/10thinch) @ 1600 dpi require 160 dot lines = 160/384 × 55 or 23 kB 2.5 mmtags @ 800 dpi require 80/384 × 55 = 12 kB

[3431] The TD interface will read 256-bits from DRAM at a time. Each256-bit read returns 2 times 128-bit tags. The TD interface to the DIUwill be a 256-bit double buffer. If there is an odd number of tags inline then the last DRAM read will contain a tag in the first 128 bitsand padding in the final 128 bits.

[3432] The TFS interface will also read 256-bits from DRAM at a time.The TFS required for a line is 136 bytes. A total of 5 times 256-bitDRAM reads is required to read the TFS for a line with 192 unused bitsin the fifth 256-bit word. A 136-byte double-line buffer will beimplemented to store the TFS data.

[3433] The TE's DIU bandwidth requirements are summarized in Table 173.TABLE 173 DRAM bandwidth requirements Peak Average Maximum number ofBandwidth Bandwidth Block cycles between each (bits/ (bits/ NameDirection 256-bit DRAM access cycle) cycle) TD Read Single 256 bitreads 1. 1.02 1.02 TFS Read Single 256 bit reads 2. 0.093 0.093 TFS is136 bytes. This means there is unused data in the fifth 256 bit read. Atotal of 5 reads is required.

[3434] 1: Each 2 mm tag lasts 126 dot cycles and requires 128 bits. Thisis a rate of 256 bits every 252 cycles.

[3435] 2: 17×64 bit reads per line in PEC1 is 5×256 bit reads per linein SoPEC with unused bits in the last 256-bit read.

[3436] 26.5.5 TD and TFS Bandstore Wrapping TABLE 174 Bandstore Inputsfrom CDU Port Name Pins I/O Description cdu_endofbandstore[21:5] 17 InAddress of the end of the current band of data. 256-bit word alignedDRAM address. cdu_startofbandstore[21:5] 17 In Address of the start ofthe current band of data. 256-bit word aligned DRAM address.

[3437] Both TD and TFS storage in DRAM can wrap around the bandstorearea. The bounds of the band store are described by inputs from the CDUshown in Table 174. The TD and TFS DRAM interfaces therefore supportbandstore wrapping. If the TD or TFS DRAM interface increments anaddress it is checked to see if it matches the end of bandstore address.If so, then the address is mapped to the start of the bandstore.

[3438] 26.5.6 Tag Sizes

[3439] SoPEC allows for tags to be between 0 to 384 dots. A typical 2 mmtag requires 126 dots. Short tags do not change the internal bandwidthor throughput behaviours at all. Tag height is specified so as to allowthe DRAM storage for raw tag data to be specified. Minimum tag width isa condition imposed by throughput limitations, so if the width is toosmall TE cannot consistently produce 2 dots per cycle across severaltags (also there are raw tag data bandwidth implications). Thinner tagsstill work, they just take longer and/or need scaling.

[3440] 26.6 Implementation

[3441] 26.6.1 Tag Encoder Architecture

[3442] A block diagram of the TE can be seen below.

[3443] The TE writes lines of bi-level tag plane data to the TFU forlater reading by the HCU. The TE is responsible for merging the encodedtag data with the tag structure (interpreted from the TFS). Y-integerscaling of tags is performed in the TE with X-integer scaling of thetags performed in the TFU. The encoded tag layer is generated 2 bits ata time and output to the TFU at this rate. The HCU however only consumes1 bit per cycle from the TFU. The TE must provide support for 126 dotTags (2 mm densely packed) with 108 Tags per line with 128 bits per tag.

[3444] The tag encoder consists of a TFS interface that loads anddecodes TFS entries, a tag data interface that loads tag raw data,encodes it, and provides bit values on request, and a state machine togenerate appropriate addressing and control signals. The TE has twoseparate read interfaces to DRAM for raw tag data, TD, and tag formatstructure, TFS.

[3445] It is possible that the raw tag data interface, the TD, to theDIU could be replaced by a hardware state machine at a later stage. Thiswould allow flexibility in the generation of tags. Support for Y scalingneeds to be added to the PEC1 TE. The PEC1 TE already allows stalling atits output during a line when tfu_te_oktowrite is deasserted.

[3446] 26.6.2 Y-Scaling Output Lines

[3447] In order to support scaling in the Y direction the followingmodifications to the PEC1 TE are suggested to the Tag Data Interface,Tag Format Structure Interface and TE Top Level:

[3448] for Tag Data Interface: program the configuration registers ofTable, firstTagLineHeight and tagMaxLine with true value i.e. notmultiplied up by the scale factor YScale. Within the Tag Data interfacethere are two counters, countx and county that have a direct bearing onthe rawTagDataAddr generation. countx decrements as tags are read fromDRAM. It is reset to NumTags[RtdTagSense] at start of each line of tags.county is decremented as each line of tags is completely read from DRAMi.e. countx=0. Scaling may be performed by counting the number of timescountx reaches zero and only decrementing county when this numberreaches YScale. This will cause the TagData Interface to read each lineof tag data NumTags[RtdTagSense]* YScale times.

[3449] for Tag Format Structure Interface: The implication of Y-scalingfor the TFS is that each Tag Line Structure is used YScale times. Thismay be accomplished in either of two ways:

[3450] For each Tag Line Structure read it once from DRAM and reuseYScale times. This involves gating the control of TFS buffer flippingwith YScale. Because of the way in which this advTfsLine and advTagLinerelated functionality is coded in the PEC1 TFS this solution is judgedto be error-prone.

[3451] Fetch each TagLineStructure YScale times. This solution involvescontrolling the activity of currTfsAddr with YScale.

[3452] In SoPEC the TFS must supply five addresses to the DIU to readeach individual Tag Line Structure. The DIU returns 4*64-bit words foreach of the 5 accesses. This is different from the behaviour in PEC1,where one address is given and 17 data-words were returned by the DIU.

[3453] Since the behaviour of the currTfsAddr must be changed to meetthe requirements of the SoPEC DIU it makes sense to include theY-Scaling into this change i.e. a count of the number of completed setsof 5 accesses to the DIU is compared to YScale. Only when this countequals YScale can currTfsAddr be loaded with the base address of thenext lines Tag Line Structure in DRAM, otherwise it is re-loaded withthe base address of the current lines Tag Line Structure in DRAM.

[3454] For Top Level: The Top Level of the TE has a counter, LinePos,which is used to count the number of completed output lines when in atag gap or in a line of tags. At the start (i.e. top-left hand dot-pair)of a gap or tag LinePos is loaded with either TagGapLine or TagMaxLine.The value of LinePos is decremented at last dot-pair in line. Y-Scalingmay be accomplished by gating the decrement of LinePos based on YScalevalue

[3455] 26.6.3 TE Physical Hierarchy

[3456]FIG. 188 above illustrates the structural hierarchy of the TE. Thetop level contains the Tag Data Interface (TDI), Tag Format Structure(TFS), and an FSM to control the generation of dot pairs along with aclocked process to carry out the PCU read/write decoding. There is alsosome additional logic for muxing the output data and generating othercontrol signals.

[3457] At the highest level, the TE state machine processes the outputlines of a page one line at a time, with the starting position either inan inter-tag gap or in a tag (a SoPEC may be only printing part of a tagdue to multiple SoPECs printing a single line).

[3458] If the current position is within an inter-tag gap, an output of0 is generated. If the current position is within a tag, the tag formatstructure is used to determine the value of the output dot, using theappropriate encoded data bit from the fixed or variable data buffers asnecessary. The TE then advances along the line of dots, moving throughtags and inter-tag gaps according to the tag placement parameters.

[3459] 26.6.4 IO Definitions TABLE 175 TE Port List Port Name Pins I/ODescription Clocks and Resets Pclk 1 In SoPEC Functional clock. prst_n 1In Global reset signal. Bandstore Signals cdu_endofbandstore[21:5] 17 InAddress of the end of the current band of data. 256-bit word alignedDRAM address. cdu_startofbandstore[21:5] 17 In Address of the start ofthe current band of data. 256-bit word aligned DRAM address.te_finishedband 1 Out TE finished band signal to PCU and ICU. PCUInterface data and control signals pcu_addr[8:2] 7 In PCU address bus. 7bits are required to decode the address space for this block.pcu_dataout[31:0] 32 In Shared write data bus from the PCU.te_pcu_datain[31:0] 32 Out Read data bus from the TE to the PCU. pcu_rwn1 In Common read/not-write signal from the PCU. pcu_te_sel 1 In Blockselect from the PCU. When pcu_te_sel is high both pcu_addr andpcu_dataout are valid. te_pcu_rdy 1 Out Ready signal to the PCU. Whente_pcu_rdy is high it indicates the last cycle of the access. For awrite cycle this means pcu_dataout has been registered by the block andfor a read cycle this means the data on te_pcu_datain is valid. TD (rawTag Data) DIU Read Interface signals td_diu_rreq 1 Out TD requests DRAMread. A read request must be accompanied by a valid read address.td_diu_radr[21:5] 17 Out TD read address to DIU. 17 bits wide (256-bitaligned word). diu_td_rack 1 In Acknowledge from DIU that TD readrequest has been accepted and new read address can be placed onte_diu_radr. diu_data[63:0] 64 In Data from DIU to TE. First 64-bits arebits 63:0 of 256 bit word; Second 64-bits are bits 127:64 of 256 bitword; Third 64-bits are bits 191:128 of 256 bit word; Fourth 64-bits arebits 255:192 of 256 bit word. diu_td_rvalid 1 In Signal from DIU tellingTD that valid read data is on the diu_data bus. TFS (Tag FormatStructure) DIU Read Interface signals tfs_diu_rreq 1 Out TFS requestsDRAM read. A read request must be accompanied by a valid read address.tfs_diu_radr[21:5] 17 Out TFS Read address to DIU 17 bits wide (256-bitaligned word). diu_tfs_rack 1 In Acknowledge from DIU that TFS readrequest has been accepted and new read address can be placed ontfs_diu_radr. diu_data[63:0] 64 In Data from DIU to TE. First 64-bitsare bits 63:0 of 256 bit word; Second 64-bits are bits 127:64 of 256 bitword; Third 64-bits are bits 191:128 of 256 bit word; Fourth 64-bits arebits 255:192 of 256 bit word. diu_tfs_rvalid 1 In Signal from DIUtelling TFS that valid read data is on the diu_data bus. TFU Interfacedata and control signals tfu_te_oktowrite 1 In Ready signal indicatingTFU has space available and is ready to be written to. Also assertedfrom the point that the TFU has recieved its expected number of bytesfor a line until the next te_tfu_wradvline te_tfu_wdata[7:0] 8 Out Writedata for TFU. te_tfu_wdatavalid 1 Out Write data valid signal. Thissignal remains high whenever there is valid output data on te_tfu_wdatate_tfu_wradvline 1 Out Advance line signal strobed when the last byte ina line is placed on te_tfu_wdata

[3460] 26.6.5 Configuration Registers

[3461] The configuration registers in the TE are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for the description ofthe protocol and timing diagrams for reading and writing registers inthe TE. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes the lower 2 bits of thePCU address bus are not required to decode the address space for the TE.Table 176 lists the configuration registers in the TE. Registers whichaddress DRAM are 64-bit DRAM word aligned as this is the case for thePEC1 TE. SoPEC assumes a 256-bit DRAM word size. If the TE can be easilymodified then the DRAM word addressing should be modified to 256-bitword aligned addressing. Otherwise, software should program these the64-bit word aligned addresses on a 256-bit DRAM word boundary. TABLE 176TE Configuration Registers value Address register # on TE_base+ namebits reset description Control registers 0x00 Reset 1 1 A write to thisregister causes a reset of the TE. This register can be read to indicatethe reset state: 0 - reset in progress 1 - reset not in progress 0x04 Go1 0 Writing 1 to this register starts the TE. Writing 0 to this registerhalts the TE. When Go is deasserted the state-machines go to their idlestates but all counters and configuration registers keep their values.When Go is asserted all counters are reset, but con- figurationregisters keep their values (i.e. they don't get reset). NextBandEnableis cleared when Go is asserted. The TFU must be started before the TE isstarted. This register can be read to determine if the TE is running (1= running, 0 = stopped). Setup registers (constant for processing of apage) 0x40 TfsStartAdr 19 0 Points to the first word of the (64-bitfirst TFS line in DRAM. aligned DRAM address - should start at a 256-bitaligned loca- tion) 0x44 TfsEndAdr 19 0 Points to the first word of the(64-bit last TFS line in DRAM. aligned DRAM address - should start at a256-bit aligned loca- tion) 0x48 TfsFirstLineAdr 19 0 Points to thefirst word of the (64-bit first TFS line to be aligned encountered onthe page. If DRAM the start of the page is in an address) inter-tag gap,then this value will be the same as TFSStartAdr since the first tag linereached will be the top line of a tag. 0x4C DataRedun 0 Defines the datato redundancy ratio for the Reed Solomon encoder. Symbol size is always4 bits, Code- word size is always 15 symbols (60 bits). 0 - 5 datasymbols (20 bits), 10 redundancy symbols (40 bits) 1 -7 data symbols (28bits), 8 redundancy symbols (32 bits) 0x50 Decode2DEn 1 0 Determineswhether or not the data bits are to be 2D decoded rather than redundancyencoded (each 2 bits of the data bits becomes 4 output data bits). 0 =redundancy encode data 1 = decode each 2 bits of data into 4 bits 0x54VariableDataPresent 1 0 Defines whether or not there is variable data inthe tags. If there is none, no attempt is made to read tag data, and tagencoding should only reference fixed tag data. 0x58 EncodeFixed 1 0Determines whether or not the lower 40 (or 56) bits of fixed data shouldbe encoded into 120 bits or simply used as is. 0x5C TagMaxDotpairs 8 0The width of a tag in dot- pairs, minus 1. Minimum 0, Maximum = 191.0x60 TagMaxLine 9 0 The number of lines in a tag, minus 1. Minimum 0,Maximum = 383. 0x64 TagGapDot 14 0 The number of dot pairs between tagsin the dot dimension minus 1. Only valid if TagGapPresent[bit 0] = 1.0x68 TagGapLine 14 0 Defines the number of dotlines between tags in theline dimension minus 1. Only valid if TagGapPresent[bit1] = 1. 0x6CDotPairsPerLine 14 0 Number of output dot pairs to generate per tagline. 0x70 DotStartTagSense 2 0 Determines for the first/even (bit 0)and second/odd (bit 1) rows of tags whether or not the first dotposition of the line is in a tag. 1 = in a tag, 0 = in an inter-tag gap.0x74 TagGapPresent 2 0 Bit 0 is 1 if there is an inter- tag gap in thedot dimension, and 0 if tags are tightly packed. Bit 1 is 1 if there isan inter- tag gap in the line dimension, and 0 if tags are tightlypacked. 0x78 YScale 8 1 Tag scale factor in Y direction. Output lines tothe TFU will be generated YScale times. 0x80 to DotStartPos 2 × 14 0Determines for the first/even 0x84 (0) and second/odd (1) rows of tagsthe number of dotpairs remaining minus 1, in either the tag or inter-taggap at the start of the line. 0x88 to 0x8C NumTags 2 × 8  0 Determinesfor the first/even and second/odd rows of tags how many tags are presentin a line (equals number of tags minus 1). Setup band related registers0xC0 NextBandStartTagDataAdr Holds the value of (64-bit StartTagDataAdrfor the next aligned band. This value is copied to DRAM StartTagDataAdrwhen address - DoneBand is 1 and should start at NextBandEnable is 1, ora 256-bit when Go transitions from 0 to aligned loca- 1. tion) 0xC4NextBandEndOfTagData Holds the value of (64-bit EndOfTagData for thenext aligned band. This value is copied to DRAM EndOfTagData whenaddress) DoneBand is 1 and NextBandEnable is 1, or when Go transitionsfrom 0 to 1. 0xC8 NextBandFirstTagLineHeight 9 0 Holds the value ofFirstTagLineHeight for the next band. This value is copied toFirstTagLineHeight when DoneBand gets is 1 and NextBandEnable is 1, orwhen Go transitions from 0 to 1. 0xCC NextBandEnable When NextBandEnableis 1 and DoneBand is 1, then when te_finishedband is set at the end of aband: NextBandStartTagDataAdr is copied to StartTagDataAdrNextBandEndOfTagData is copied to EndOfTagDataNextBandFirstTagLineHeight is copied to FirstTagLineHeight DoneBand iscleared NextBandEnable is cleared. NextBandEnable is cleared when Go isasserted. Read-only band related registers 0xD0 DoneBand 1 0 Specifieswhether the tag data interface has finished loading all the tag data forthe band. It is cleared to 0 when Go transitions from 0 to 1. When thetag data interface has finished loading all the tag data for the band,the te_finishedband signal is given out and the DoneBand flag is set. IfNextBandEnable is 1 at this time then startTagDataAdr, endOfTagData andfirstTaglineHeight are updated with the values for the next band andDoneBand is cleared. Processing of the next band starts immediately. IfNextBandEnable is 0 then the remainder of the TE will continue to run,,while the read control unit waits for NextBandEnable to be set before itrestarts. Read only. 0xD4 StartTagDataAdr 19 0 The start address of the(64-bit current row of raw tag data. aligned This is initially points tothe DRAM first word of the band's tag address - data, which should bealigned should start at to a 128-bit boundary (i.e. the a 256-bit lowerbit of this address aligned loca- should be 0). Read only. tion) 0xD8EndOfTagData 19 0 Points to the address of the (64-bit final tag for theband. When aligned all the tag data up to and DRAM including addressaddress) endOfTagData has been read in, the te_finishedband signal isgiven and the doneBand flag is set. Read only. 0xDC FirstTagLineHeight 90 The number of lines minus 1 in the first tag encountered in this band.This will be equal to TagMaxLine if the band starts at a tag boundary.Read only. Work registers (set before starting the TE and must not betouched between bands) 0x100 LineInTag 1 0 Determines whether or not thefirst line of the page is in a line of tags or in an inter-tag gap. 1 -in a tag, 0 - in an inter-tag gap. 0x104 LinePos 14 0 The number oflines remaining minus 1, in either the tag or the inter-tag gap in atthe start of the page. 0x110 to TagData 4 × 32 0 This 128 bit registermust be 0x11C set up initially with the fixed data record for the page.This is either the lower 40 (or 56) bits (and the encodeFixed registershould be set), or the lower 120 bits (and encodedFixed should beclear). The tagData[0] register contains the lower 32 bits and thetagData[3] register contains the upper 32 bits. This register is usedthroughout the tag encoding process to hold the next tag's variabledata. Work registers (set internally) Read-only from the point of viewof PCU register access 0x140 DotPos 14 0 Defines the number of dotpairsremaining in either the tag or inter-tag gap. Does not need to be setup.0x144 CurrTagPlaneAdr 14 0 The dot-pair number being generated. 0x148DotsInTag 1 0 Determines whether the current dot pair is in a tag or not1 - in a tag, 0 - in an inter-tag gap. 0x14C TagAltSense 1 0 Determineswhether the production of output dots is or the first (and subsequenteven) or second (and subsequent odd) row of tags. 0x154 CurrTFSAdr 19 0Points to the start next line of (64-bit the TFS to be read in. alignedDRAM address) 0x158 ReadsRemaining 4 0 Number of reads remaining in thecurrent burst from the raw tag data interface 0x15C CountX 8 0 Thenumber of tags remaining to be read (minus 1) by the raw tag datainterface for the current line. 0x160 CountY 9 0 The number of times(minus 1) the tag data for the current line of tags needs to be read inby the raw tag data interface. 0x164 RtdTagSense 1 0 Determines whetherthe raw tag data interface is currently reading even rows of tags (=0)or odd rows of tags (=1) with respect to the start of the page. Notethat this can be different from tagAltSense since the raw tag datainterface is reading ahead of the production of dots. 0x168RawTagDataAdr 19 0 The current read address (64-bit within the unencodedraw tag aligned data. DRAM address)

[3462] The PCU accessible registers are divided amongst the TE top leveland the TE sub-blocks. This is achieved by including write decoders inthe sub-blocks as well as the top level, see FIG. 189. In order toperform reads the sub-block registers are fed to the top level where theread decode is carried out on all the PCU accessible TE registers.

[3463] 26.6.5.1 Starting the TE and Restarting the TE Between Bands

[3464] The TE must be started after the TFU.

[3465] For the first band of data, users set up NextBandStartTagDataAdr,NextBandEndTagData and NextBandFirstTagLineHeight as well as other TEconfiguration registers. Users then set the TE's Go bit to startprocessing of the band. When the tag data for the band has finishedbeing decoded, the te_finishedband interrupt will be sent to the PCU andICU indicating that the memory associated with the first band is nowfree. Processing can now start on the next band of tag data.

[3466] In order to process the next band NextBandStartTagDataAdr,NextBandEndTagData and NextBandFirstTagLineHeight need to be updatedbefore writing a 1 to NextBandEnable. There are 4 mechanisms forrestarting the TE between bands:

[3467] a. te_finishedband causes an interrupt to the CPU. The TE willhave set its DoneBand bit. The CPU reprograms theNextBandStartTagDataAdr, NextBandEndTagData andNextBandFirstTagLineHeight registers, and sets NextBandEnable to restartthe TE.

[3468] b. The CPU programs the TE's NextBandStartTagDataAdr,NextBandEndTagData and NextBandFirstTagLineHeight registers and sets theNextBandEnable flag before the end of the current band. At the end ofthe current band the TE sets DoneBand. As NextBandEnable is already 1,the TE starts processing the next band immediately.

[3469] c. The PCU is programmed so that te_finishedband triggers the PCUto execute commands from DRAM to reprogram the NextBandStaifTagDataAdr,NextBandEndTagData and NextBandFirstTagLineHeight registers and set theNextBandEnable bit to start the TE processing the next band. Theadvantage of this scheme is that the CPU could process band headers inadvance and store the band commands in DRAM ready for execution.

[3470] d. This is a combination of b and c above. The PCU (rather thanthe CPU in b) programs the TE's NextBandStartTagDataAdr,NextBandEndTagData and NextBandFirstTagLineHeight registers and sets theNextBandEnable bit before the end of the current band. At the end of thecurrent band the TE sets DoneBand and pulses te_finishedband. AsNextBandEnable is already 1, the TE starts processing the next bandimmediately. Simultaneously, te_finishedband triggers the PCU to fetchcommands from DRAM. The TE will have restarted by the time the PCU hasfetched commands from DRAM. The PCU commands program the TE next bandshadow registers and sets the NextBandEnable bit.

[3471] After the first tag on the page, all bands have their first tagstart at the top i.e. NextBandFirstTagLineHeight=TagMaxLine. Thereforethe same value of NextBandFirstTagLineHeight will normally be used forall bands. Certainly, NextBandFirstTagLineHeight should not need tochange after the second time it is programmed.

[3472] 26.6.6 TE Top Level FSM

[3473] The following diagram illustrates the states in the FSM.

[3474] At the highest level, the TE state machine steps through theoutput lines of a page one line at a time, with the starting positioneither in an inter-tag gap (signal dotsintag=0) or in a tag (signalstfsvalid and tdvalid and lineintag=1) (a SoPEC may be only printing partof a tag due to multiple SoPECs printing a single line).

[3475] If the current position is within an inter-tag gap, an output of0 is generated. If the current position is within a tag, the tag formatstructure is used to determine the value of the output dot, using theappropriate encoded data bit from the fixed or variable data buffers asnecessary. The TE then advances along the line of dots, moving throughtags and inter-tag gaps according to the tag placement parameters.

[3476] Table 177 highlights the signals used within the FSM. TABLE 177Signals used within TE top level FSM Signal Name Function pclk Syncclock used to register all data within the FSM prst_n, te_reset Resetsignals advtagline 1 cycles pulse indicating to TDI and TFS sub-blocksto move onto the next line of Tag data currdotlineadr[13:0] Addresscounter starting 2 pclk ahead of currtagplaneadr to generate the correctdotpair for the current line dotpos Counter to identify how manydotpairs wide the tag/gap is dotsintag Signal identifying whether thedotpair are in a tag(1)/gap(0) lineintag_temp Identical to lineintag butgenerated 1 pclk earlier linepos_shadow Shadow register for linepos dueto linepos being written to by 2 different processes talaltsense Flagwhich alternates between tag/gap lines te_state FSM state variableteplanebuf 6-bit shift register used to format dotpairs into a byte forthe TFU wradvline Advance line signal strobed when the last byte in aline is placed on te_tfu_wdata

[3477] Due to the 2 system clock delay in the TFS (both Table A andTable B outputs are registered) the TE FSM is working 2 system clockcycles AHEAD of the logic generating the write data for the TFU. As aresult the following control signals had to be single/double registeredon the system clock.

[3478] The tag_dot_line state can be broken down into 3 differentstages.

[3479] Stage1:—The state tag_dot_line is entered due to the go signalbecoming active. This state controls the writing of dotbytes to the TFU.As long as the tag line buffer address is not equal to thedotpairsperline register value and tfu_te_oktowrite is active, and thereis valid TFS and TD available or taggaps, dotpairs are buffered intobytes and written to the TFU. The tag line buffer address is usedinternally but not supplied to the TFU since the TFU is a FIFO ratherthan the line store used in PEC1.

[3480] While generating the dotline of a tag/gap line (lineintag flag=1)the dot position counter dotpos is decremented/reloaded (withtagmaxdotpairs or taggapdot) as the TE moves between tags/gaps. Thedotsintag flag is toggled between tags/gaps (0 for a gap, 1 for a tag).This pattern continues until the end of a dotline approaches(currdotlineadr==dotpairsperline).

[3481] 2 system clock cycles before the end of the dotline the lineintagand tagaltsense signals must be prepared for the next dotline be it in atag/gap dotline or a purely gap dotline.

[3482] Stage2:—At this point the end of a dot line is reached so it istime to decrement the linepos counter if still in a tag/gap row orreload the linepos register, dotpos counter and reprogram the dotsintagflag if going onto another tag/gap or pure gap row. Any signal with the_temp extension means this register is updated a cycle early in orderfor the real register to get its correct value while switching betweendot lines and tag rows when dotpos and linepos counters reach zero i.ewhen dotpos=0 the end of a tag/gap has been reached, when linepos=0 theend of a tag row is reached. This stage uses the signals lineintag tempand tagaltsense which were generated one system clock cycle earlier inStage 1.

[3483] Stage3:—This stage implements the writing of dotpairs to thecorrect part of the 6-bit shift register based on the LSBs ofcurrtagplaneadr and also implements the counter for the currtagplaneadr.The currtagplaneadr is reset on reachingcurrtagplaneadr=(dotpairsperline−1). All the qualifier signals e.gdotsintag for this stage are delayed by 2 system clock cycles i.e. thecurrtagplaneadr (which is the internal write address not needed by theTFU) cannot be incremented until the dotpairs are available which isalways 2 system clock cycles later than when currdotlineadr isincremented.

[3484] The wradvline and advtagline pulses are generated using the samelogic (currently separated in the PEC1 Tag Encoder VHDL for clarity).Both of these pulses used to update further registers hence the reasonthey do not use the delayed by 2 system clock cycle qualifiers.

[3485] 26.6.7 Combinational Logic

[3486] The TDI is responsible for providing the information data for atag while the TFSI is responsible for deciding whether a particular doton the tag should be printed as background pattern or tag information.Every dot within a tag's boundary is either an information dot or partof the background pattern.

[3487] The resulting lines of dots are stored in the TFU.

[3488] The TFSI reads one Tag Line Structure (TLS) from the DIU forevery dot line of tags. Depending on the current printing positionwithin the tag (indicated by the signal tagdotnum), the TFS interfaceoutputs dot information for two dots and if necessary the correspondingread addresses for encoded tag data. The read address are supplied tothe TDI which outputs the corresponding data values.

[3489] These data values (tdi_etd0 and tdi_etd1) are then combined withthe dot information (tfsi_ta_dot0 and tfsi_ta_dot1) to produce the dotvalues that will actually be printed on the page (dots), see FIG. 192.

[3490] The signal lastdotintag is generated by checking that the dotsare in a tag (dotsintag=1) and that the dotposition counter dotpos isequal to zero. It is also used by the TFS to load the index addressregister with zeros at the end of a tag as this is always the startingindex when going from one tag to the next. lastdotintag is gated withadvtagline in the TFSi (Table C) where adv_tfs_line pulse is used toupdate the Table C address reg for the new tag line—this is becauselastdotintag occurs a cycle earlier than adv_tfs_line which would resultin the wrong Table C value for the last dotpair. lastdotintag is alsoused in the TDi FSM (etd_switch state) to pulse the etd_advtag signalhence switching buffers in the ETDi for the next tag.

[3491] The signal lastdotintag1 is identical to lastdotintag except itis combinatorially generated (1 cycle earlier than lastdotintag, exceptat the end of a tagline). lastdotintag1 signal is only used in the TDito reset the tdvalid signal on the cycle when dotpos=0. Note theUNSIGNED(currdotlineadr)=UNSIGNED(dotpairsperline)−1 notUNSIGNED(currdotlineadr)=UNSIGNED(dotpairsperline)−2 as in thelastdotintag_gen process as this is an combinatorial process.

[3492] The dotposvalid signal is created based on being in a tag line(lineintag1=1), dots being in a tag (dotsintag1=1), having a valid tagformat structure available (tfsvalid1=1) and having encoded tag dataavailable (tdvalid1=1). Note that each of the qualifier signals aredelayed by 1 pclk cycle due to the registering of Table A output datainto Table C where dotposvalid is used. The dotposvalid signal is usedas an enable to load the Table C address register with the next indexinto Table B which in turn provides the 2 addresses to make 2 dotsavailable.

[3493] The signal te_tfu_wdatavalid can only be active if in a taggap orif valid tag data is available (tdvalid2 and tfsvalid2) and thecurrtagpplaneadr(1:0) equal 11 i.e. a byte of data has been generated bycombining four dotpairs.

[3494] The signal tagdotnum tells the TFS how many dotpairs remain in atag/gap. It is calculated by subtracting the value in the dotpos counterfrom the value programmed in the tagmaxdotpairs register.

[3495] 26.7 Tag Data Interface (TDI)

[3496] 26.7.1 I/O Specification TABLE 178 TDI Port List signal name I/ODescription Clocks and Resets pclk In SoPEC system clock prst_n InActive-low, synchronous reset in pclk domain. DIU Read Interface Signalsdiu_data[63:0] In Data from DRAM. td_diu_rreq Out Data request to DRAM.td_diu_radr[21:5] Out Read address to DRAM. diu_td_rack In Dataacknowledge from DRAM. diu_td_rvalid In Data valid signal from DRAM. PCUInterface Data, Control Signals and pcu_dataout[31:0] In PCU writes thisdata. pcu_addr[8:2] In PCU accesses this address. pcu_rwn In Globalread/write-not signal from PCU. pcu_te_sel In PCU selects TE for r/waccess. pcu_te_reset In PCU reset. td_te_doneband Out PCU readableregisters. td_te_dataredun td_te_decode2den td_te_variabledatapresenttd_te_encodefixed td_te_numtags0 td_te_numtags1 td_te_starttagdataadrtd_te_rawtagdataadr td_te_endoftagdata td_te_firsttaglineheighttd_te_tagdata0 td_te_tagdata1 td_te_tagdata2 td_te_tagdata3 td_te_countxtd_te_county td_te_rtdtagsense td_te_readsremaining TFS (Tag FormatStructure) tfsi_adr0[8:0] In Read address for dot0 tfsi_adr1[8:0] InRead address for dot1 Bandstore Signals cdu_startofbandstore[24:0] InStart memory area allocated for page bands cdu_endofbandstore[24:0] InLast address of the memory allocated for page bands te_finishedband OutTag encoder band finished

[3497] 26.7.2 Introduction

[3498] The tag data interface is responsible for obtaining the raw tagdata and encoding it as required by the tag encoder. The smallesttypical tag placement is 2 mm×2 mm, which means a tag is at least 1261600 dpi dots wide.

[3499] In PEC1, in order to keep up with the HCU which processes 2 dotsper cycle, the tag data interface has been designed to be capable ofencoding a tag in 63 cycles. This is actually accomplished inapproximately 52 cycles within PEC1. For SoPEC the TE need only produceone dot per cycle; it should be able to produce tags in no more thantwice the time taken by the PEC1 TE. Moreover, any change inimplementation from two dots to one dot per cycle should not lose the63/52 cycle performance edge attained in the PEC1 TE.

[3500] As shown in FIG. 198, the tag data interface contains a raw tagdata interface FSM that fetches tag data from DRAM, two symbol-at-a-timeGF(2⁴) Reed-Solomon encoders, an encoded data interface and a statemachine for controlling the encoding process. It also contains a tagDataregister that needs to be set up to hold the fixed tag data for thepage.

[3501] The type of encoding used depends on the registersTE_encodefixed, TE_dataredun and TE_decode2den the options being,

[3502] (15,5) RS coding, where every 5 input symbols are used to produce15 output symbols, so the output is 3 times the size of the input. Thiscan be performed on fixed and variable tag data.

[3503] (15,7) RS coding, where every 7 input symbols are used to produce15 output symbols, so for the same number of input symbols, the outputis not as large as the (15,5) code (for more details see section 26.7.6on page 435). This can be performed on fixed and variable tag data.

[3504] 2D decoding, where each 2 input bits are used to produce 4 outputbits. This can be performed on fixed and variable tag data.

[3505] no coding, where the data is simply passed into the Encoded DataInterface. This can be performed on fixed data only.

[3506] Each tag is made up of fixed tag data (i.e. this data is the samefor each tag on the page) and variable tag data (i.e. different for eachtag on the page).

[3507] Fixed tag data is either stored in DRAM as 120-bits when it isalready coded (or no coding is required), 40-bits when (15,5) coding isrequired or 56-bits when (15,7) coding is required. Once the fixed tagdata is coded it is 120-bits long. It is then stored in the Encoded TagData Interface. The variable tag data is stored in the DRAM in uncodedform. When (15,5) coding is required, the 120-bits stored in DRAM areencoded into 360-bits. When (15,7) coding is required, the 112-bitsstored in DRAM are encoded into 240-bits. When 2D decoding is requiredthe 120-bits stored in DRAM are converted into 240-bits. In each casethe encoded bits are stored in the Encoded Tag Data Interface.

[3508] The encoded fixed and variable tag data are eventually used toprint the tag.

[3509] The fixed tag data is loaded in once from the DRAM at the startof a page. It is encoded as necessary and is then stored in one of the8×15-bits registers/RAMs in the Encoded Tag Data Interface. This dataremains unchanged in the registers/RAMs until the next page is ready tobe processed.

[3510] The 120-bits of unencoded variable tag data for each tag isstored in four 32-bit words. The TE re-reads the variable tag data, fora particular tag from DRAM, every time it produces that tag. Thevariable tag data FIFO which reads from DRAM has enough space to store 4tags.

[3511] 26.7.2.1 Bandstore Wrapping

[3512] Both TD and TFS storage in DRAM can wrap around the bandstorearea. The bounds of the band store are described by inputs from the CDUshown in Table. The TD and TFS DRAM interfaces therefore supportbandstore wrapping. If the TD or TFS DRAM interface increments anaddress it is checked to see if it matches the end of bandstore address.If so, then the address is mapped to the start of the bandstore.

[3513] 26.7.3 Data Flow

[3514] An overview of the dataflow through the TDI can be seen in FIG.198 below.

[3515] The TD interface consists of the following main sections:

[3516] the Raw Tag Data Interface—fetches tag data from DRAM;

[3517] the tag data register;

[3518] 2 Reed Solomon encoders—each encodes one 4-bit symbol at a time;

[3519] the Encoded Tag Data Interface—supplies encoded tag data foroutput;

[3520] Two 2D decoders.

[3521] The main performance specification for PEC1 is that the TE mustbe able to output data at a continuous rate of 2 dots per cycle.

[3522] 26.7.4 Raw Tag Data Interface

[3523] The raw tag data interface (RTDI) provides a simple means ofaccessing raw tag data in DRAM. The RTDI passes tag data into a FIFOwhere it can be subsequently read as required. The 64-bit output fromthe FIFO can be read directly, with the value of the wr_rd_counter beingused to set/reset as the enable signal (rtdAvail). The FIFO is clockedout with receipt of an rtdRd signal from the TS FSM.

[3524]FIG. 199 shows a block diagram of the raw tag data interface.

[3525] 26.7.4.1 RTDI FSM

[3526] The RTDI state machine is responsible for keeping the raw tagFIFO full. The state machine reads the line of tag data once for eachprintline that uses the tag. This means a given line of tag data will beread TagHeight times. Typically this will be 126 times or more, based onan approximately 2 mm tag. Note that the first line of tag data may beread fewer times since the start of the page may be within a tag. Inaddition odd and even rows of tags may contain different numbers oftags. Section 26.6.5.1 outlines how to start the TE and restart itbetween bands. Users must set the NextBandStartTagDataAdr,NextBandEndOfTagData, NextBandFirstTagLineHeight and numTags[0],numTags[1] registers before starting the TE by asserting Go.

[3527] To restart the tag encoder for second and subsequent bands of apage, the NextBandStartTagDataAdr, NextBandEndOfTagData andNextBandFirstTagLineHeight registers need to be updated (typicallynumTags[0] and numTags[1] will be the same if the previous band containsan even number of tag rows) and NextBandEnable set. See Section 26.6.5.1for a full description of the four ways of reprogramming the TE betweenbands.

[3528] The tag data is read once for every printline containing tags.When maximally packed, a row of tags contains 163 tags (see Table npage465 on page 408).

[3529] The RTDI State Flow diagram is shown in FIG. 200. An explanationof the states follows: idle state:—Stay in the idle state if there is novariable data present. If there is variable data present and there areat least 4 spaces left in the FIFO then request a burst of 2 tags fromthe DRAM (1*256 bits). Counter countx is assigned the number of tags ina even/odd line which depends on the value of register rtdtagsense.Down-counter county is assigned the number of dot lines high a tag willbe (min 126). Initially it must be set the firsttaglineheight value asthe TE may be between pages (i.e. a partial tag). For normal taggeneration county will take the value of tagmaxline register.

[3530] diu_access:—The diu_access state will generate a request to theDRAM if there are at least 4 spaces in the FIFO. This is indicated bythe counter wr_rd_counter which is incremented/decremented onwrites/reads of the FIFO. As long as wr_rd_counter is less than 4 (FIFOis 8 high) there must be 4 locations free. A control signal calledtd_diu_radrvalid is generated for the duration of the DRAM burst access.Addresses are sent in bursts of 1. The counter burst_count controls thissignal, (will involve modification to existing TE code.) If there is anodd number of tags in line then the last DRAM read will contain a tag inthe first 128 bits and padding in the final 128 bits.

[3531] fifo_load:—This state controls the addressing to the DRAM.Counters countx and county are used to monitor whether the TE isprocessing a line of dots within a row of tags. When countx is zero itmeans all tag dots for this row are complete. When county is zero itmeans the TE is on the last line of dots (prior to Y scaling) for thisrow of tags. When a row of tags is complete the sense of rtdtagsense isinverted (odd/even). The rawtagdataadr is compared to thete_endoftagdata address. If rawtagdataadr=endoftagdata the donebandsignal is set, the finishedband signal is pulsed, and the FSM enters thertd_stall state until the doneband signal is reset to zero by the PCU bywhich time the rawtagdata, endoftagedata and firsttaglineheightregisters are setup with new values to restart the TE. This state isused to count the 64-bit reads from the DIU. Each time diu_td_rvalid ishigh rtd_data_count is incremented by 1. The compare ofrtd_data_count=rtd_num is necessary to find out when either all 4*64-bitdata has been received or n*64-bit data (depending on a match ofrawtagdataadr=endoftagdata in the middle of a set of 4*64-bit valuesbeing returned by the DIU. rtd_stall:—This state waits for the thedoneband signal to be reset (see page 426 for a description of how thisoccurs). Once reset the FSM returns to the idle state. This states alsoperforms the same count on the diu_data read as above in the case wherediu_td_rvalid has not gone high by the time the addressing is completeand the end of band data has been reached i.e.rawtagdataadr=endoftagdata

[3532] 26.7.5 TDI State Machine

[3533] The tag data state machine has two processing phases. The firstprocessing phase is to encode the fixed tag data stored in the 128-bit(2×64-bit) tag data register. The second is to encode tag data as it isrequired by the tag encoder.

[3534] When the Tag Encoder is started up, the fixed tag data is alreadypreloaded in the 128 bit tag data record. If encodeFixed is set, thenthe 2 codewords stored in the lower bits of the tag data record need tobe encoded: 40 bits if dataRedun=0, and 56 bits if dataRedun=1. IfencodeFixed is clear, then the lower 120 bits of the tag data recordmust be passed to the encoded tag data interface without being encoded.

[3535] When encodeFixed is set, the symbols derived from codeword 0 arewritten to codeword 6 and the symbols derived from codeword 1 arewritten to codeword 7. The data symbols are stored first and then theremaining redundancy symbols are stored afterwards, for a total of 15symbols. Thus, when dataRedun=0, the 5 symbols derived from bits 0-19are written to symbols 0-4, and the redundancy symbols are written tosymbols 5-14. When dataRedun=1, the 7 symbols derived from bits 0-27 arewritten to symbols 0-6, and the redundancy symbols are written tosymbols 7-14.

[3536] When encodeFixed is clear, the 120 bits of fixed data is copieddirectly to codewords 6 and 7. The TDI State Flow diagram is shown inFIG. 202. An explanation of the states follows.

[3537] idle:—In the idle state wait for the tag encoder gosignal—top_go=1. The first task is to either store or encode the Fixeddata. Once the Fixed data is stored or encoded/stored the donefixed flagis set. If there is no variable data the FSM returns to the idle statehence the reason to check the donefixed flag before advancing i.e. onlystore/encode the fixed data once.

[3538] fixed_data:—In the fixed_data state the FSM must decode whetherto directly store the fixed data in the ETDi or if the fixed data needsto be either (15:5) (40-bits) or (15:7) (56-bits) RS encoded or 2Ddecoded. The values stored in registers encodefixed and dataredun anddecode2den determine what the next state should be.

[3539] bypass to etdi:—The bypass_to_etdi takes 120-bits of fixeddata(pre-encoded) from the tag_data(127:0) register and stores it in the15*8 (by 2 for simultaneous reads) buffers. The data is passed from thetag data register through 3 levels of muxing (level1, level2, level3)where it enters the RS0/RS1 encoders (which are now in a straightthrough mode (i.e. control_(—)5 and control_(—)7 are zero hence the datapasses straight from the input to the output). The MSBs of theetd_wr_adr must be high to store this data as codewords 6,7.

[3540] etd_buf_switch:—This state is used to set the tdvalid signal andpulse the etd_adv_tag signal which in turn is used to switch the readwrite sense of the ETDi buffers (wrsb0). The firsttime signal is used toidentify the first time a tag is encoded. If zero it means read the tagdata from the RTDi FIFO and encode. Once encoded and stored the FSMreturns to this state where it evaluates the sense of tdvalid. Firsttime around it will be zero so this sets tdvalid and returns to thereadtagdata state to fill the 2nd ETDi buffer. After this the FSMreturns to this state and waits for the lastdotintag signal to arrive.In between tags when the lastdotingtag signal is received theetd_adv_tag is pulsed and the FSM goes to the readtagdata state. Howeverif the lastdotintag signal arrives at the end of a line there is anextra 1 cycle delay introduced in generating the etd_adv_tag pulse (viaetd_adv_tag_endofline) due to the pipelining in the TFS. This allows allthe previous tag to be read from the correct buffer and seamlesstransfer to the other buffer for the next line.

[3541] readtagdata:—The readtagdata state waits to receive a rtdavailsignal from the raw tag data interface which indicates there is raw tagdata available. The tag_data register is 128-bits so it takes 2 pulsesof the rtdrd signal to get the 2*64-bits into the tag_data register. Ifthe rtdavail signal is set rtdrd is pulsed for 1 cycle and the FSM stepsonto the loadtagdata state. Initially the flag first64 bits will bezero. The 64-bits of rtd are assigned to the tag_data[63:0] and the flagfirst64 bits is set to indicate the first raw tag data read is complete.The FSM then steps back to the read_tagdata state where it generates thesecond rtdrd pulse. The FSM then steps onto the loadtagdata state forwhere the second 64-bits of rawtag data are assigned totag_data[128:64]. loadtagdata:—The loadtagdata state writes the raw tagdata into the tag_data register from the RTDi FIFO. The first64 bitsflag is reset to zero as the tag_data register now contains 120/112 bitsof variable data. A decode of whether to (15:5) or (15:7) RS encode or2D decode this data decides the next state.

[3542] rs_(—)15_(—)5:—The rs_(—)15_(—)5 (Reed Solomon (15:5) mode) stateeither encodes 40-bit Fixed data or 120-bit Variable data and providesthe encoded tag data write address and write enable (etd_wr_adr andetdwe respectively). Once the fixed tag data is encoded the donefixedflag is set as this only needs to be done once per page. Thevariabledatapresent register is then polled to see if there is variabledata in the tags. If there is variable data present then this data mustbe read from the RTDi and loaded into the tag_data register. Else thetdvalid flag must be set and FSM returns to the idle state. control_(—)5is a control bit for the RS Encoder and controls feedforward andfeedback muxes that enable (15:5) encoding.

[3543] The rs_(—)15_(—)5 state also generates the control signals forpassing 120-bits of variable tag data to the RS encoder in 4-bit symbolsper clock cycle. rs_counter is used both to control the level1_mux andact as the 15-cycle counter of the RS Encoder. This logic cycles for atotal of 3*15 cycles to encode the 120-bits.

[3544] rs_(—)15_(—)7:—The rs_(—)15_(—)7 state is similar to thers_(—)15_(—)5 state except the level1_mux has to select 7 4-bit symbolsinstead of 5.

[3545] decode_(—)2d 15_(—)5, decode_(—)2d_(—)15_(—)7:—The decode_(—)2dstates provides the control signals for passing the 120-bit variabledata to the 2D decoder. The 2 lsbs are decoded to create 4 bits. The 4bits from each decoder are combined and stored in the ETDi. Next the 2MSBs are decoded to create 4 bits. Again the 4 bits from each decoderare combined and stored in the ETDi.

[3546] As can be seen from Figure n page 488 on page Error! Bookmark notdefined. there are 3 stages of muxing between the Tag Data register andthe RS encoders or 2D decoders. Levels 1-2 are controlled by level1_muxand level2_mux which are generated within the TDi FSM as is the writeaddress to the ETDi buffers (etd_wr_adr)

[3547]FIGS. 203 through 208 illustrate the mappings used to store theencoded fixed and variable tag data in the ETDI buffers.

[3548] 26.7.6 Reed Solomon (RS) Encoder

[3549] 26.7.7 Introduction

[3550] A Reed Solomon code is a non binary, block code. If a symbolconsists of m bits then there are q=2^(m) possible symbols defining thecode alphabet. In the TE, m=4 so the number of possible symbols is q=16.

[3551] An (n,k) RS code is a block code with k information symbols and ncode-word symbols. RS codes have the property that the code word n islimited to at most q+1 symbols in length.

[3552] In the case of the TE, both (15,5) and (15,7) RS codes can beused. This means that up to 5 and 4 symbols respectively can becorrected.

[3553] Only one type of RS coder is used at any particular time. The RScoder to be used is determined by the registers TE_dataredun andTE_decode2den:

[3554] TE_dataredun=0 and TE_decode2den=0, then use the (15,5) RS coder

[3555] TE_dataredun=1 and TE_decode2den=0, then use the (15,7) RS coder

[3556] For a (15,k) RS code with m=4, k 4-bit information symbolsapplied to the coder produce 15 4-bit codeword symbols at the output. Inthe TE, the code is systematic so the first k codeword symbols are thesame the as the k input information symbols.

[3557] A simple block diagram can be seen in.

[3558] 26.7.8 I/O Specification

[3559] A I/O diagram of the RS encoder can be seen in.

[3560] 26.7.9 Proposed Implementation

[3561] In the case of the TE, (15,5) and (15,7) codes are to be usedwith 4-bits per symbol.

[3562] The primitive polynomial is p(x)=x⁴x+x+1

[3563] In the case of the (15,5) code, this gives a generator polynomialof

g(x)=(x+a)(x+a ²)(x+a ³)(x+a ⁴)(x+a ⁵)(x+a ⁶)(x+a ⁷)(x+a ⁸)(x+a ⁹)(x+a¹⁰)

g(x)=x ¹⁰ +a ² x ⁹ +a ³ x ⁸ +a ⁹ x ⁷ +a ⁶ x ⁶ +a ¹⁴ x ⁵ +a ² x ⁴ +ax ³+a ⁶ x ² +ax+a ¹⁰

g(x)=x ¹⁰ +g ₉ x ⁹ +g ₈ x ⁸ +g ₇ x ⁷ +g ₆ x ⁶ +g ₅ x ⁵ +g ₄ x ⁴ +g ₃ x ³+g ₂ x ² +g ₁ x+g ₀

[3564] In the case of the (15,7) code, this gives a generator polynomialof

h(x)=(x+a)(x+a ²)(x+a ³)(x+a ⁴)(x+a ⁵)(x+a ⁶)(x+a ⁷)(x+a ⁸)

h(x)=x ⁸ +a ¹⁴ x ⁷ +a ² x ⁶ +a ⁴ x ⁵ +a ² x ⁴ +a ¹³ x ³ +a ⁵ x ² +a ¹¹x+a ⁶

h(x)=x ⁸ +h ₇ x ⁷ +h ₆ x ⁶ +h ₅ x ⁵ +h ₄ x ⁴ +h ₃ x ³ +h ₂ x ² +h ₁ x+h₀

[3565] The output code words are produced by dividing the generatorpolynomial into a polynomial made up from the input symbols.

[3566] This division is accomplished using the circuit shown in FIG.211.

[3567] The data in the circuit are Galois Field elements so addition andmultiplication are performed using special circuitry. These areexplained in the next sections.

[3568] The RS coder can operate either in (15,5) or (15,7) mode. Theselection is made by the registers TE_dataredun and TE_decode2den.

[3569] When operating in (15,5) mode control_(—)7 is always zero andwhen operating in (15,7) mode control_(—)5 is always zero.

[3570] Firstly consider (15,5) mode i.e. TE_dataredun is set to zero.

[3571] For each new set of 5 input symbols, processing is as follows:

[3572] The 4-bits of the first symbol d₀ are fed to the input portrs_data_in(3:0) and control_(—)5 is set to 0. mux2 is set so as to usethe output as feedback. control_(—)5 is zero so mux4 selects the input(rs_data_in) as the output (rs_data_out). Once the data has settled (<<1cycle), the shift registers are clocked. The next symbol d₁ is thenapplied to the input, and again after the data has settled the shiftregisters are clocked again. This is repeated for the next 3 symbols d₂,d₃ and d₄. As a result, the first 5 outputs are the same as the inputs.After 5 cycles, the shift registers now contain the next 10 requiredoutputs. control_(—)5 is set to 1 for the next 10 cycles so that zerosare fed back by mux2 and the shift register values are fed to the outputby mux3 and mux4 by simply clocking the registers.

[3573] A timing diagram is shown below.

[3574] Secondly consider (15,7) mode i.e. TE_dataredun is set to one.

[3575] In this case processing is similar to above except thatcontrol_(—)7 stays low while 7 symbols (d₀, d₁ . . . d₆) are fed in. Aswell as being fed back into the circuit, these symbols are fed to theoutput.

[3576] After these 7 cycles, control_(—)7 is set to 1 and the contentsof the shift registers are fed to the output.

[3577] A timing diagram is shown below.

[3578] The enable signal can be used to start/reset the counter and theshift registers.

[3579] The RS encoders can be designed so that encoding starts on arising enable edge. After 15 symbols have been output, the encoder stopsuntil a rising enable edge is detected. As a result there will be adelay between each codeword.

[3580] Alternatively, once the enable goes high the shift registers arereset and encoding will proceed until it is told to stop. rs_data_inmust be supplied at the correct time. Using this method, data can becontinuously output at a rate of 1 symbol per cycle, even over a fewcodewords.

[3581] Alternatively, the RS encoder can request data as it requires.

[3582] The performance criterion that must be met is that the followingmust be carried out within 63 cycles

[3583] load one tag's raw data into TE_tagdata

[3584] encode the raw tag data

[3585] store the encoded tag data in the Encoded Tag Data Interface

[3586] In the case of the raw fixed tag data at the start of a page,there is no definite performance criterion except that it should beencoded and stored as fast as possible.

[3587] 26.7.10 Galois Field Elements and Their Representation

[3588] A Galois Field is a set of elements in which we can do addition,subtraction, multiplication and division without leaving the set.

[3589] The TE uses RS encoding over the Galois Field GF(2⁴). There are2⁴ elements in GF(2⁴) and they are generated using the primitivepolynomial p(x)=x⁴+x+1.

[3590] The 16 elements of GF(2⁴) can be represented in a number ofdifferent ways. Table 179 shows three possible representations—thepower, polynomial and 4-tuple representation. TABLE 179 GF(2⁴)representations 4-tuple power Polynomial representation representationRepresentation (a0 a1 a2 a3) 0 0 (0 0 0 0) 1 1 (1 0 0 0) A x (0 1 0 0)α²

x² (0 0 1 0) α³ x³ (0 0 0 1) α⁴ 1 + x (1 1 0 0) α⁵ x + x² (0 1 1 0) a⁶x² + x³ (0 0 1 1) α⁷ 1 + x

+ x³ (1 1 0 1) α⁸ 1 + x² (1 0 1 0) α⁹ (0 1 0 1) x

+ x³ α¹⁰ 1 + x + x² (1 1 1 0) α¹¹ x + x² + x³ (0 1 1 1) α¹² 1 + x + x² +x³ (1 1 1 1) α¹³ 1 + x² + x³ (1 0 1 1) α¹⁴ 1 + x³ (1 0 0 1)

[3591] 26.7.11 Multiplication of GF(2⁴) Elements

[3592] The multiplication of two field elements α^(a) and α^(b) isdefined as

[3593] α¹=α^(a).α^(b)=α^((a+b)modulo 15)

[3594] Thus

α¹.α²=α³

α⁵.α¹⁰=α¹⁵

α⁶.α¹²=α³

[3595] So if we have the elements in exponential form, multiplication issimply a matter of modulo 15 addition.

[3596] If the elements are in polynomial/tuple form, the polynomialsmust be multiplied and reduced mod x⁴+x+1.

[3597] Suppose we wish to multiply the two field elements in GF(2⁴):

α^(a) =a ₃ x ³ +a ₂ x ² +a ₁ x ¹ +a ₀

α^(b) =b ₃ x ³ +b ₂ x ² +b ₁ x ¹ +b ₀

[3598] where a₁, b_(i) are in the field (0,1) (i.e. modulo 2 arithmetic)

[3599] Multiplying these out and using x⁴+x+1=0 we get:

α^(a+b)=[(a ₀ b ₃ +a ₁ b ₂ +a ₂ b ₁ +a ₃ b ₀)+a ₃ b ₃ ]x ³+[(a ₀ b ₂ +a₁ b ₁ +a ₂ b ₀)+a ₃ b ₃ +(a ₃ b ₂ +a ₂ b ₃)]x ²+[(a ₀ b ₁ +a ₁ b ₀)+(a ₃b ₂ +a ₂ b ₃)+(a ₁ b ₃ +a ₂ b ₂ +a ₃ b ₁)]x+[(a ₀ b ₀ +a ₁ b ₃ +a ₂ b ₂+a ₃ b ₁)]

α^(a+b) =[a ₀ b ₃ +a ₁ b ₂ +a ₂ b ₁ +a ₃(b ₀ +b ₃)]x ³ +[a ₀ b ₂ +a ₁ b₁ +a ₂(b ₀ +b ₃)+a ₃(b ₂ +b ₃)]x ² +[a ₀ b ₁ +a ₁(b ₀ +b ₃)+a ₂(b ₂ +b₃)+a ₃(b ₁ +b ₂)]^(x) +[a ₀ b ₀ +a ₁ b ₃ +a ₂ b ₂ +a ₃ b ₁]

[3600] If we wish to multiply an arbitrary field element by a fixedfield element we get a more simple form. Suppose we wish to multiplyα^(b) by α³.

[3601] In this case α³=x³ so (a0 a1 a2 a3)=(0001). Substituting thisinto the above equation gives

α^(c)=(b ₀ +b ₃)x ³+(b ₂ +b ₃)x ²+(b ₁ +b ₂)x+b ₁

[3602] This can be implemented using simple XOR gates as shown in FIG.214

[3603] 26.7.12 Addition of GF(2⁴) Elements

[3604] If the elements are in their polynomial/tuple form, polynomialsare simply added.

[3605] Suppose we wish to add the two field elements in GF(2⁴):

α⁸ =a ₃ x ³ +a ₂ x ² +a ₁ x+a ₀

α^(b) =b ₃ x ³ +b ₂ z ² +b ₁ x+b ₀

[3606] where a_(i), b_(i) are in the field (0,1) (i.e. modulo 2arithmetic)

α^(c)=α^(a)+α^(b)=(a ₃ +b ₃)x ³+(a ₂ +b ₂)x2+(a ₁ +b ₁)x+(a ₀ +b ₀)

[3607] Again this can be implemented using simple XOR gates as shown inFIG. 215

[3608] 26.7.13 Reed Solomon Implementation

[3609] The designer can decide to create the relevant addition andmultiplication circuits and instantiate them where necessary.Alternatively the feedback multiplications can be combined as follows.Consider the multiplication

α^(a).α^(b)=α^(c)

[3610] or in terms of polynomials

(a ₃ x ³ +a ₂ x ² +a ₁ x+a ₀).(b ₃ x ³ +b ₂ x ² +b ₁ x+b ₀)=(c ₃ x ³ +c₂ x ² +c ₁ x+c ₀)

[3611] If we substitute all of the possible field elements in for α^(a)and express α^(c) in terms of α^(b), we get the table of results shownin Table 180. TABLE 180 α^(c) multiplied by all field elements,expressed in terms of α^(b) αa = a3x3 + a2x2 + a1x + a0 fixed fieldc3x3 + c2x2 + c1x + c0 element (a0 a1 a2 a3) c0 c1 c2 c3 0 (0 0 0 0) 1(1 0 0 0) b₀ b₁ b₂ b₃ a (0 1 0 0) b₃ b₀ + b₃ b₁ b₂ α² (0 0 1 0) b₂ b₂ +b₃ b₀ + b₃ b₁ α³ (0 0 0 1) b₁ b₁ + b₂ b₂ + b₃ b₀ + b₃ α⁴ (1 1 0 0) b₀ +b₃ b₀ + b₁ + b₃ b₁ + b₂ b₂ + b₃ α⁵ (0 1 1 0) b₂ + b₃ b₀ + b₂ b₀ + b₁ +b₃ b₁ + b₂ a⁶ (0 0 1 1) b₁ + b₂ b₁ + b₃ b₀ + b₂ b₀ + b₁ + b₃ α⁷ (1 10 1) b₀ + b₁ + b₃ b₀ + b₂ + b₃ b₁ + b₃ b₀ + b₂ α⁸ (1 0 1 0) b₀ ₊ b₂ b₁ +b₂ + b₃ b₀ + b₂ + b₃ b₁ + b₃ α⁹ (0 1 0 1) b₁ + b₃ b₀ + b₁ + b₂ + b₃ b₁ +b₂ + b₃ b₀ + b₂ + b₃ α¹⁰ (1 1 1 0) b₀ + b₂ + b₃ b₀ + b₁ + b₂ b₀ + b₁ +b₂ + b₃ b₁ + b₂ + b₃ α¹¹ (0 1 1 1) b₁ + b₂ + b₃ b₀ + b₁ b₀ + b₁ + b₂b₀ + b₁ + b₂ + b₃ α¹² (1 1 1 1) b₀ + b₁ + b₂ + b₃ b₀ b₀ + b₁ b₀ + b₁ +b₂ α¹³ (1 0 1 1) b₀ + b₁ + b₂ b₃ b₀ b₀ + b₁ α¹⁴ (1 0 0 1) b₀ + b₁ b₂ b₃b₀

[3612] the following signals are required:

[3613] b₀, b₁, b₂, b₃,

[3614] (b₀+b₁), (b₀+b₂), (b₀+b₃), (b₁+b₂), (b₁+b₃), (b₂+b₃),

[3615] (b₀+b₁+b₂), (b₀+b₁+b₃), (b₀+b₂+b₃), (b₁+b₂+b₃),

[3616] (b₀+b₁+b₂+b₃)

[3617] The implementation of the circuit can be seen in Figure. The maincomponents are XOR gates, 4-bit shift registers and multiplexers.

[3618] The RS encoder has 4 input lines labelled 0, 1, 2 & 3 and 4output lines labelled 0, 1, 2 & 3. This labelling corresponds to thesubscripts of the polynomial/4-tuple representation. The mapping of4-bit symbols from the TE tagdata register into the RS is as follows:

[3619] the LSB in the TE_tagdata is fed into line0

[3620] the next most significant LSB is fed into line1

[3621] the next most significant LSB is fed into line2

[3622] the MSB is fed into line3

[3623] The RS output mapping to the Encoded tag data interface issimiliar. Two encoded symbols are stored in an 8-bit address. Withinthese 8 bits:

[3624] line0 is fed into the LSB (bit 0/4)

[3625] line1 is fed into the next most significant LSB (bit 1/5)

[3626] line2 is fed into the next most significant LSB (bit 2/6)

[3627] line3 is fed into the MSB (bit 3/7)

[3628] 267.14 2D Decoder

[3629] The 2D decoder is selected when TE_decode2den=1. It operates onvariable tag data only. its function is to convert 2-bits into 4-bitsaccording to Table 181. TABLE 181 Operation of 2D decoder input output 00 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0

[3630] 26.7.15 Encoded Tag Data Interface

[3631] The encoded tag data interface contains an encoded fixed tag datastore interface and an encoded variable tag data store interface, asshown in FIG. 217.

[3632] The two reord units simply reorder the 9 input bits to maplow-order codewords into the bit selection component of the address asshown in Table 182. Reordering of write addresses is not necessary sincethe addresses are already in the correct format. TABLE 182 Reord unitinput output bit# bit interpretation bit interpretation 8 A select 1 of8 A select 1 of 4 codewords codeword tables 7 B B 6 C D select 1 of 15symbols 5 D select 1 of E 15 symbols 4 E F 3 F G 2 G C select 1 of 8bits 1 H select 1 H of 4 bits 0 I I

[3633] The encoded fixed data interface is a single 15×8-bit RAM with 2read ports and 1 write port. As it is only written to during page setuptime (it is fixed for the duration of a page) there is no need forsimultaneous read/write access. However the fixed data store must becapable of decoding two simultaneous reads in a single cycle. FIG. 218shows the implementation of the fixed data store.

[3634] The encoded variable tag data interface is a double buffered3×15×8-bit RAM with 2 read ports and 1 write port. The double bufferingallows one tag's data to be read (two reads in a single cycle) while thenext tag's variable data is being stored. Write addressing is 6 bits: 2bits of address for selecting 1 of 3, and 4 bits of address forselecting 1 of 15. Read addressing is the same with the addition of 3more address bits for selecting 1 of 8.

[3635]FIG. 219 shows the implementation of the encoded variable tag datastore. Double buffering is implemented via two sub-buffers. Each time anAdvTag pulse is received, the sense of which sub-buffer is being readfrom or written to changes. This is accomplished by a 1-bit flag calledwrsb0. Although the initial state of wrsb0 is irrelevant, it must invertupon receipt of an AdvTag pulse. The structure of each sub-buffer isshown in FIG. 220.

[3636] 26.8 Tag Format Structure (TFS) Interface

[3637] 26.8.1 Introduction

[3638] The TFS specifies the contents of every dot position within atags border i.e.:

[3639] is the dot part of the background?

[3640] is the dot part of the data?

[3641] The TFS is broken up into Tag Line Structures (TLS) which specifythe contents of every dot position in a particular line of a tag. EachTLS consists of three tables—A, B and C (see FIG. 221).

[3642] For a given line of dots, all the tags on that line correspond tothe same tag line structure.

[3643] Consequently, for a given line of output dots, a single tag linestructure is required, and not the entire TFS. Double buffering allowsthe next tag line structure to be fetched from the TFS in DRAM while theexisting tag line structure is used to render the current tag line.

[3644] The TFS interface is responsible for loading the appropriate lineof the tag format structure as the tag encoder advances through thepage. It is also responsible for producing table A and table B outputsfor two consecutive dot positions in the current tag line.

[3645] There is a TLS for every dot line of a tag.

[3646] All tags that are on the same line have the exact same TLS.

[3647] A tag can be up to 384 dots wide, so each of these 384 dots mustbe specified in the TLS.

[3648] The TLS information is stored in DRAM and one TLS must be readinto the TFS Interface for each line of dots that are outputted to theTag Plane Line Buffers.

[3649] Each TLS is read from DRAM as 5 times 256-bit words with 214padded bits in the last 256-bit DRAM read.

[3650] 26.8.2 I/O Specification TABLE 183 Tag Format Structure InterfacePort List signal signal name type description Pclk In SoPEC system clockprst_n In Active-low, synchronous reset in pclk domain top_go In Gosignal from TE top level DRAM diu_data[63:0] In Data from DRAMdiu_tfs_rack In Data acknowledge from DRAM diu_tfs_rvalid In Data validfrom DRAM tfs_diu_rreq Out Read request to DRAM tfs_diu_radr[21:5] OutRead address to DRAM tag encoder top level top_advtagline In Pulsedafter the last line of a row of tags top_tagaltsense In For even tagrows = 0 i.e. 0,2,4 . . . For odd tag rows = 1 i.e. 1,3,5 . . .top_lastdotintag In Last dot in tag is currently being processedtop_dotposvalid In Current dot position is a tag dot and its structuredata and tag data is available top_tagdotnum[7:0] In Counts from zero upto TE_tagmaxdotpairs (min. = 1, max. = 192) tfsi_valid Out TLS tables A,B and C, ready for use tfsi_ta_dot0[1:0] Out Even entry from Table Acorresponding to top_tagdotnum tfsi_ta_dot1[1:0] Out Odd entry fromTable A corresponding to top_tagdotnum tag encoder top level (PCU readdecoder) tfs_te_tfsstartadr[23:0] Out TFS tfsstartadr registertfs_te_tfsendadr[23:0] Out TFS tfsendadr registertfs_te_tfsfirstlineadr[23:0] Out TFS tfsfirstlineadr registertfs_te_currtfsadr[23:0] Out TFS currtfsadr register TDItfsi_tdi_adr0[8:0] Out Read address for dot0 (even dot)tfsi_tdi_adr1[8:0] Out Read address for dot1 (odd dot)

[3651] 26.8.2.1 State Machine

[3652] The state machine is responsible for generating control signalsfor the various TFS table units, and to load the appropriate line fromthe TFS. The states are explained below.

[3653] idle:—Wait for top_go to become active. Pulse adv_tfs_line for 1cycle to reset tawradr and tbwradr registers. Pulsing adv_tfs_line willswitch the read/write sense of Table B so switching Table A here as wellto keep things the same i.e. wrta0=NOT(wrta0).

[3654] diu_access:—In the diu_access state a request is sent to the DIU.Once an ack signal is received Table A write enable is asserted and theFSM moves to the tls_load state.

[3655] tls_load:—The DRAM access is a burst of 5 256-bit accesses,ultimately returned by the DIU as 5*(4*64 bit) words. There will be 192padded bits in the last 256-bit DRAM word. The first 12 64-bit wordsreads are for Table A, words 12 to 15 and some of 16 are for Table Bwhile part of read 16 data is for Table C. The counter read_num is usedto identify which data goes to which table. The table B data is storedtemporarily in a 288-bit register until the tls_update state hence tbwedoes not become active until read_num=16).

[3656] The DIU data goes directly into Table A (12*64).

[3657] The DIU data for Table B is loaded into a 288-bit register.

[3658] The DIU data goes directly into Table C.

[3659] tls_update:—The 288-bits in Table B need to written to a 32*9buffer. The tls_update state takes care of this using the read_numcounter.

[3660] tls_next—This state checks the logic level of tfsvalid andswitches the read/write senses of Table A (wrta0) and Table B a cyclelater (using the adv_tfs_line pulse). The reason for switching Table A acycle early is to make sure the top_level address via tagdotnum ispointing to the correct buffer. Keep in mind the top_level is working acycle ahead of Table A and 2 cycles ahead of Table B.

[3661] If tfsValid is 1, the state machine waits until the advTagLinesignal is received. When it is received, the state machine pulsesadvTFSLine (to switch read/write sense in tables A, B, C), and startsreading the next line of the TFS from currTFSAdr.

[3662] If tfsValid is 0, the state machine pulses advTFSLine (to switchread/write sense in tables A, B, C) and then jumps to thetls_tfsvalid_set state where the signal tfsValid is set to 1 (allowingthe tag encoder to start, or to continue if it had been stalled). Thestate machine can then start reading the next line of the TFS fromcurrTFSAdr.

[3663] tls_tfsvalid_next:—Simply sets the tfsvalid signal and returnsthe FSM to the diu_access state.

[3664] If an advTagLine signal is received before the next line of theTFS has been read in, tfsValid is cleared to 0 and processing continuesas outlined above.

[3665] 26.8.2.2 Bandstore Wrapping

[3666] Both TD and TFS storage in DRAM can wrap around the bandstorearea. The bounds of the band store are described by inputs from the CDUshown in Table. The TD and TFS DRAM interfaces therefore supportbandstore wrapping. If the TD or TFS DRAM interface increments anaddress it is checked to see if it matches the end of bandstore address.If so, then the address is mapped to the start of the bandstore.

[3667] The TFS state flow diagram is shown in below.

[3668] 26.8.3 Generating a Tag From Tables A, B and C

[3669] The TFS contains an entry for each dot position within the tag'sbounding box. Each entry specifies whether the dot is part of theconstant background pattern or part of the tag's data component (bothfixed and variable).

[3670] The TFS therefore has TagHeight×TagWidth entries, where TagHeightis the height of the tag in dot-lines and TagWidth is the width of thetag in dots. The TFS entries that specify a single dot-line of a tag areknown as a Tag Line Structure.

[3671] The TFS contains a TLS for each of the 1600 dpi lines in thetag's bounding box. Each TLS contains three contiguous tables, known astables A, B and C.

[3672] Table A contains 384 2-bit entries i.e. one entry for each dot ina single line of a tag up to the maximum width of a tag. The actualnumber of entries used should match the size of the bounding box for thetag in the dot dimension, but all 384 entries must be present.

[3673] Table B contains 32 9-bit data address that refer to (in order ofappearance) the data dots present in the particular line. Again, all 32entries must be present, even if fewer are used.

[3674] Table C contains two 5-bit pointers into table B and is followedby 22 unused bits. The total length of each TLS is therefore 34 32-bitwords.

[3675] Each output dot value is generated as follows: Each entry inTable A consists of 2-bits-bit0 and bit1. These 2-bits are interpretedaccording to Table 184, Table 185 and Table 186. TABLE 184Interpretation of bit0 from entry in Table A bit0 interpretation 0 theoutput bit comes directly from bit1 (see Table). 1 the output bit comesfrom a data bit. Bit1 is used in conjunction with Tag Line StructureTable B to determine which data bit will be output.

[3676] TABLE 185 Interpretation of bit1 from entry in table A when bit0= 0 bit 1 interpretation 0 output 0 1 output 1

[3677] TABLE 186 Interpretation of bit1 from entry in table A when bit0= 1 bit 1 interpretation 0 output data bit pointed to by current indexinto Table B. 1 output data bit pointed to by current index into TableB, and advance index by 1.

[3678] If bit0=0 then the output dot for this entry is part of theconstant background pattern. The dot value itself comes from bit1 i.e.if bit1=0 then the output is 0 and if bit1=1 then the output is 1. Ifbit0=1 then the output dot for this entry comes from the variable orfixed tag data. Bit1 is used in conjunction with Tables B and C todetermine data bits to use.

[3679] To understand the interpretation of bit1 when bit0=1 we need toknow what is stored in Table B. Table B contains the addresses of allthe data bits that are used in the particular line of a tag in order ofappearance. Therefore, up to 32 different data bits can appear in a lineof a tag. The address of the first data dot in a tag will be given bythe address stored in entry 0 of Table B. As we advance along thevarious data dots we will advance through the various Table B entries.Each Table B entry is 9-bits long and each points to a specific variableor fixed data bit for the tag. Each tag contains a maximum of 120 fixedand 360 variable data bits, for a total of 480 data bits. To aid addressdecoding, the addresses are based on the RS encoded tag data. Tablelists the interpretation of the 9-bit addresses. TABLE 187Interpretation of 9-bit tag data address in Table B bit pos namedescription 8 CodeWordSelect Select 1 of 8 codewords. Codewords 0, 1, 2,3, 4, 5 are variable data. Codewords 6, 7 are fixed data. 7 6 5SymbolSelect Select 1 of 15 symbols (1111 invalid) 4 3 2 1 BitSelectSelect 1 of 4 bits from the selected symbols 0

[3680] If the fixed data is supplied to the TE in an unencoded form, thesymbols derived from codeword 0 of fixed data are written to codeword 6and the symbols derived from fixed data codeword 1 are written tocodeword 7. The data symbols are stored first and then the remainingredundancy symbols are stored afterwards, for a total of 15 symbols.Thus, when 5 data symbols are used, the 5 symbols derived from bits 0-19are written to symbols 0-4, and the redundancy symbols are written tosymbols 5-14. When 7 data symbols are used, the 7 symbols derived frombits 0-27 are written to symbols 0-6, and the redundancy symbols arewritten to symbols 7-14

[3681] However, if the fixed data is supplied to the TE in a pre-encodedform, the encoding could theoretically be anything. Consequently the 120bits of fixed data is copied to codewords 6 and 7 as shown in Table 188.TABLE 188 Mapping of fixed data to codeword/symbols when no redundancyencoding output symbol output input bits range codeword  0-19 0-4 620-39 0-4 7 40-59 5-9 6 60-79 5-9 7 80-99 10-14 6 100-119 10-14 7

[3682] It is important to note that the interpretation of bit1 fromTable A (when bit0-1) is relative. A 5-bit index is used to cyclethrough the data address in Table B. Since the first tag on a particularline may or may not start at the first dot in the tag, an initial valuefor the index into Table B is needed. Subsequent tags on the same linewill always start with an index of 0, and any partial tag at the end ofa line will simply finish before the entire tag has been rendered. Theinitial index required due to the rendering of a partial tag at thestart of a line is supplied by Table C. The initial index will bedifferent for each TLS and there are two possible initial indexes sincethere are effectively two types of rows of tags in terms of initialoffsets.

[3683] Table C provides the appropriate start index into Table B (25-bit indices). When rendering even rows of tags, entry 0 is used as theinitial index into Table B, and when rendering odd rows of tags, entry 1is used as the initial index into Table B. The second and subsequenttags start at the left most dots position within the tag, so can use aninitial index of 0.

[3684] 26.8.4 Architecture

[3685] A block diagram of the Tag Format Structure Interface can be seenin FIG. 223.

[3686] 26.8.4.1 Table A Interface

[3687] The implementation of table A is two 16×64-bit RAMs with a smallamount of control logic, as shown in FIG. 224. While one RAM is readfrom for the current line's table A data (4 bits representing 2contiguous table A entries), the other RAM is being written to with thenext line's table A data (64-bits at a time).

[3688] Note:—The Table A data to be printed (if each LSB=0) must bepassed to the top_level 2 cycles after the read of Table A due to the2-stage pipelining in the TFS from registering Table A and Table Boutputs hence this extra registering stage for the generation ofta_dot0_(—)1 cyclelater and ta_dot1_(—)1 cyclelater.

[3689] Each time an AdvTFSLine pulse is received, the sense of which RAMis being read from or written to changes. This is accomplished by a1-bit flag called wrta0. Although the initial state of wrta0 isirrelevant, it must invert upon receipt of an AdvTFSLine pulse. A 4-bitcounter called taWrAdr keeps the write address for the 12 writes thatoccur after the start of each line (specified by the AdvTFSLine controlinput). The tawe (table A write enable) input is set whenever the datain is to be written to table A. The taWrAdr address counterautomatically increments with each write to table A. Address generationfor tawe and taWrAdr is shown in Table 189.

[3690] 26.8.4.2 Table C Interface

[3691] A block diagram of the table C interface is shown below in FIG.226.

[3692] The address generator for table C contains a 5 bit addressregister adr that is set to a new address at the start of processing thetag (either of the two table C initial values based on tagAltSense atthe start of the line, and 0 for subsequent tags on the same line). Eachcycle two addresses into table B are generated based on the two 2-bitinputs (in0 and in1). As shown in Section 189, the output addresstbRdAdr0 is always adr and tbRdAdr1 is one of adr and adr+1, and at theend of the cycle adr takes on one of adr, adr+1, and adr+2. TABLE 189AdrGen lookup table inputs outputs in0 in1 adr0Sel adr1Sel adrSel 00 00X¹⁸ X adr 00 01 X adr adr 00 10 X X adr 00 11 X adr adr + 1 01 00 adr Xadr 01 01 adr adr adr 01 10 adr X adr 01 11 adr adr adr + 1 10 00 X Xadr 10 01 X adr adr 10 10 X X adr 10 11 X adr adr + 1 11 00 adr X adr +1 11 01 adr adr+1 adr + 1 11 10 adr X adr + 1 11 11 adr adr+1 adr + 2

[3693] 26.8.4.3 Table B Interface

[3694] The table B interface implementation generates two encoded tagdata addresses (tfsi_adr0, tfsu_adr1) based on two table B inputaddresses (tbRdAdr0, tbRdAdr1). A block diagram of table B can be seenin FIG. 227.

[3695] Table B data is initially loaded into the 288-bit table Btemporary register via the TFS FSM. Once all 288-bit entries have beenloaded from DRAM, the data is written in 9-bit chunks to the 32*9register arrays based on tbwradr.

[3696] Each time an AdvTFSLine pulse is received, the sense of which subbuffer is being read from or written to changes. This is accomplished bya 1-bit flag called wrtb0. Although the initial state of wrtb0 isirrelevant, it must invert upon receipt of an AdvTFSLine pulse.

[3697] Note:—The output addresses from Table B are registered.

[3698] 27 Tag FIFO Unit (TFU)

[3699] 27.1 Overview

[3700] The Tag FIFO Unit (TFU) provides the means by which data istransferred between the Tag Encoder (TE) and the HCU. By abstracting thebuffering mechanism and controls from both units, the interface is cleanbetween the data user and the data generator.

[3701] The TFU is a simple FIFO interface to the HCU. The Tag Encoderwill provide support for arbitrary Y integer scaling up to 1600 dpi. Xinteger scaling of the tag dot data is performed at the output of theFIFO in the TFU. There is feedback to the TE from the TFU to allowstalling of the TE during a line. The TE interfaces to the TFU with adata width of 8 bits. The TFU interfaces to the HCU with a data width of1 bit.

[3702] The depth of the TFU FIFO is chosen as 16 bytes so that the FIFOcan store a single 126 dot tag.

[3703] 27.1.1 Interfaces Between TE, TFU and HCU

[3704] 27.1.1.1 TE-TFU Interface

[3705] The interface from the TE to the TFU comprises the followingsignals:

[3706] te_tfu_wdata, 8-bit write data.

[3707] te_tfu_wdatavalID, write data valid.

[3708] te_tfu_wradvline, accompanies the last valid 8-bit write data ina line.

[3709] The interface from the TFU to TE comprises the following signal:

[3710] tfu_te_oktowrite, indicating to the TE that there is spaceavailable in the TFU FIFO.

[3711] The TE writes data to the TFU FIFO as long as the TFU'stfu_te_oktowrite output bit is set. The TE write will not occur unlessdata is accompanied by a data valid signal.

[3712] 27.1.1.2 TFU-HCU Interface

[3713] The interface from the TFU to the HCU comprises the followingsignals:

[3714] tfu_hcu_tdata, 1-bit data.

[3715] tfu_hcu_avail, data valid signal indicating that there is dataavailable in the TFU FIFO.

[3716] The interface from HCU to TFU comprises the following signal:

[3717] hcu_tfu_ready, indicating to the TFU to supply the next dot.

[3718] 27.1.1.2.1 X Scaling

[3719] Tag data is replicated a scale factor (SF) number of times in theX direction to convert the final output to 1600 dpi. Unlike both the CFUand SFU, which support non-integer scaling, the scaling is integer only.Replication in the X direction is performed at the output of the TFUFIFO on a dot-by-dot basis.

[3720] To account for the case where there may be two SoPEC devices,each generating its own portion of a dot-line, the first dot in a linemay not be replicated the total scale-factor number of times by anindividual TFU. The dot will ultimately be scaled-up correctly with bothdevices doing part of the scaling, one on its lead-out and the other onits lead in.

[3721] Note two SoPEC TEs may be involved in producing the same byte ofoutput tag data straddling the printhead boundary. The HCU of the leftSoPEC will accept from its TE the correct amount of dots, ignoring anydots in the last byte that do not apply to its printhead. The TE of theright SoPEC will be programmed the correct number of dots into the tagand its output will be byte aligned with the left edge of the printhead.

[3722] 27.2 Definitions OF I/O TABLE 190 TFU Port List Port Name PinsI/O Description Clocks and Resets Pclk 1 In SoPEC Functional clock.Prst_n 1 In Global reset signal. PCU Interface data and control signalsPcu_adr[4:2] 2 In PCU address bus. Only 3 bits are required to decodethe address space for this block. Pcu_dataout[31:0] 32 In Shared writedata bus from the PCU. Tfu_pcu_datain[31:0] 32 Out Read data bus fromthe TFU to the PCU. Pcu_rwn 1 In Common read/not-write signal from thePCU. Pcu_tfu_sel 1 In Block select from the PCU. When pcu_tfu_sel ishigh both pcu_adr and pcu_dataout are valid. Tfu_pcu_rdy 1 Out Readysignal to the PCU. When tfu_pcu_rdy is high it indicates the last cycleof the access. For a write cycle this means pcu_dataout has beenregistered by the block and for a read cycle this means the data ontfu_pcu_datain is valid. TE Interface data and control signalsTe_tfu_wdata[7:0] 8 In Write data for TFU FIFO. Te_tfu_wdatavalid 1 InWrite data valid signal. Te_tfu_wradvline 1 In Advance line signalstrobed when the last byte in a line is placed on te_tfu_wdatatfu_te_oktowrite 1 Out Ready signal indicating TFU has space availablein it's FIFO and is ready to be written to. HCU Interface data andcontrol signals Hcu_tfu_advdot 1 In Signal indicating to the TFU thatthe HCU is ready to accept the next dot of data from TFU. tfu_hcu_tdata1 Out Data from the TFU FIFO. tfu_hcu_avail 1 Out Signal indicatingvalid data available from TFU FIFO.

[3723] 27.3 Configuration Registers TABLE 191 TFU ConfigurationRegisters value Address register on TFU_Base + name #bits resetdescription Control registers 0x00 Reset 1 1 A write to this registercauses a reset of the TFU. This register can be read to indicate thereset state: 0 - reset in progress 1 - reset not in progress. 0x04 Go 1see Writing 1 to this register starts the TFU. text Writing 0 to thisregister halts the TFU. When Go is deasserted the state- machines go totheir idle states but all counters and configuration registers keeptheir values. When Go is asserted all counters are reset, butconfiguration registers keep their values (i.e. they don't get reset).The TFU must be started before the TE is started. This register can beread to determine if the TFU is running (1 = running, 0 = stopped).Setup registers (constant during processing of page) 0x08 XScale 8 1 Tagscale factor in X direction. 0x0C XFracScale 8 1 Tag scale factor in Xdirection for the fFirst dot in a line (must be programmed to be lessthan or equal to XScale) 0x10 TEByteCount 12 0 The number of bytes to beaccepted from the TE per line. Once this number of bytes have beenreceived subsequent bytes are ignored until there is a strobe on thete_tfu_wradvline 0x14 HCUDotCount 16 0 The number of (optionally)x-scaled dots per line to be supplied to the HCU. Once this number hasbeen reached the remainder of the current FIFO byte is ignored.

[3724] 27.4 Detailed Description

[3725] The FIFO is a simple 16-byte store with read and write pointers,and a contents store, FIG. 229. 16 bytes is sufficient to store a single126 dot tag.

[3726] Each line a total of TEByteCount bytes is read into the FIFO. Allsubsequent bytes are ignored until there is a strobe on thete_tfu_wradvline signal, whereupon bytes for the next line are stored.On the HCU side, a total of HCUDotCount dots are produced at the output.Once this count is reached any more dots in the FIFO byte currentlybeing processed are ignored. For the first dot in the next line thestart of line scale factor, XFracScale, is used.

[3727] The behaviour of these signals and the control signals betweenthe TFU and the TE and HCU is detailed below. // Concurrently ExecutedCode: // TE always allowed to write when there's either (a)room or (b) no room and all // bytes for that line have been received.if ((FifoCntnts != FifoMax) OR (FifoCntnts = = FifoMax and ByteToRx = =0)) then tfu_te_oktowrite = 1 else tfu_te_oktowrite = 0 // Datapresented to HCU when there is (a) data in FIFO and (b) the HCU has not// received all dots for a line if (FifoCntnts != 0) AND (BitToTx !=0)then tfu_hcu_avail = 1 else tfu_hcu_avail = 0 // Output mux of FIFOdata tfu_hcu_tdata = Fifo[FifoRdPnt] [RdBit] // Sequentially ExecutedCode: if (te_tfu_wdatavalid = = 1) AND (FifoCntnts != FifoMax) AND(ByteToRx != 0) then Fifo[FifoWrPnt] = te_tfu_wdata FifoWrPnt ++FifoContents ++ ByteToRx −− if (te_tfu_wradvline = = 1) then ByteToRx =TEByteCount if (hcu_tfu_advdot = = 1 and FifoCntnts != 0) then { BitToTx++ if (RepFrac = = 1) then RepFrac = Xscale if (RdBit = 7) then RdBit =0 FifoRdPnt ++ FifoContents −− else RdBit++ else RepFrac−− if (BitToTx= = 1) then { RepFrac = XFracScale RdBit = 0 FifoRdPnt ++ FifoContents−−BitToTx = HCUDotCount } }

[3728] What is not detailed above is the fact that, since this is acircular buffer, both the fifo read and write-pointers wrap-around tozero after they reach two. Also not detailed is the fact that if thereis a change of both the read and write-pointer in the same cycle, thefifo contents counter remains unchanged.

[3729] 28 Alftoner Compositor Unit (HCU)

[3730] 28.1 Overview

[3731] The Halftoner Compositor Unit (HCU) produces dots for each nozzlein the destination printhead taking account of the page dimensions(including margins). The spot data and tag data are received in bi-levelform while the pixel contone data received from the CFU must be ditheredto a bi-level representation. The resultant 6 bi-level planes for eachdot position on the page are then remapped to 6 output planes and outputdot at a time (6 bits) to the next stage in the printing pipeline,namely the dead nozzle compensator (DNC).

[3732] 28.2 Data Flow

[3733]FIG. 230 shows a simple dot data flow high level block diagram ofthe HCU. The HCU reads contone data from the CFU, bi-level spot datafrom the SFU, and bi-level tag data from the TFU.

[3734] Dither matrices are read from the DRAM via the DIU. Thecalculated output dot (6 bits) is read by the DNC.

[3735] The HCU is given the page dimensions (including margins), and isonly started once for the page. It does not need to be programmed inbetween bands or restarted for each band. The HCU will stallappropriately if its input buffers are starved. At the end of the pagethe HCU will continue to produce 0 for all dots as long as data isrequested by the units further down the pipeline (this allows laterunits to conveniently flush pipelined data).

[3736] The HCU performs a linear processing of dots calculating the6-bit output of a dot in each cycle. The mapping of 6 calculated bits to6 output bits for each dot allows for such example mappings. ascompositing of the spot0 layer over the appropriate contone layer(typically black), the merging of CMY into K (if K is present in theprinthead), the splitting of K into CMY dots if there is no K in theprinthead, and the generation of a fixative output bitstream.

[3737] 28.3 Dram Storage Requirements

[3738] SoPEC allows for a number of different dither matrixconfigurations up to 256 bytes wide. The dither matrix is stored inDRAM. Using either a single or double-buffer scheme a line of the dithermatrix must be read in by the HCU over a SoPEC line time. SoPEC mustproduce 13824 dots per line for A4/Letter printing which takes 13824cycles.

[3739] The following give the storage and bandwidths requirements forsome of the possible configurations of the dither matrix.

[3740] 4 Kbyte DRAM storage required for one 64×64 (preferred) bytedither matrix

[3741] 6.25 Kbyte DRAM storage required for one 80×80 byte dither matrix

[3742] 16 Kbyte DRAM storage required for four 64×64 byte dithermatrices

[3743] 64 Kbyte DRAM storage required for one 256×256 byte dither matrix

[3744] It takes 4 or 8 read accesses to load a line of dither matrixinto the dither matrix buffer, depending on whether we're using a singleor double buffer (configured by DoubleLineBuff register).

[3745] 28.4 Implementation

[3746] A block diagram of the HCU is given in FIG. 231.

[3747] 28.4.1 Definition of I/O TABLE 192 HCU port list and descriptionPort name Pins I/O Description Clocks and reset Pclk 1 In System clock.prst_n 1 In System reset, synchronous active low. PCU interfacepcu_hcu_sel 1 In Block select from the PCU. When pcu_hcu_sel is highboth pcu_adr and pcu_dataout are valid. pcu_rwn 1 In Commonread/not-write signal from the PCU. pcu_adr[7:2] 6 In PCU address bus.Only 6 bits are required to decode the address space for this block.pcu_dataout[31:0] 32 In Shared write data bus from the PCU. hcu_pcu_rdy1 Out Ready signal to the PCU. When hcu_pcu_rdy is high it indicates thelast cycle of the access. For a write cycle this means pcu_dataout hasbeen registered by the block and for a read cycle this means the data onhcu_pcu_datain is valid. hcu_pcu_datain[31:0] 32 Out Read data bus tothe PCU. DIU interface hcu_diu_rreq 1 Out HCU read request, active high.A read request must be accompanied by a valid read address. diu_hcu_rack1 In Acknowledge from DIU, active high. Indicates that a read requesthas been accepted and the new read address can be placed on the addressbus, hcu_diu_radr. hcu_diu_radr[21:5] 17 Out HCU read address. 17 bitswide (256-bit aligned word). diu_hcu_rvalid 1 In Read data valid, activehigh. Indicates that valid read data is now on the read data bus,diu_data. diu_data[63:0] 64 In Read data from DIU. CFU interfacecfu_hcu_avail 1 In Indicates valid data present on cfu_hcu_c[3-0]datalines. cfu_hcu_c0data[7:0] 8 In Pixel of data in contone plane 0.cfu_hcu_c1data[7:0] 8 In Pixel of data in contone plane 1.cfu_hcu_c2data[7:0] 8 In Pixel of data in contone plane 2.cfu_hcu_c3data[7:0] 8 In Pixel of data in contone plane 3.hcu_cfu_advdot 1 Out Informs the CFU that the HCU has captured the pixeldata on cfu_hcu_c[3-0]data lines and the CFU can now place the nextpixel on the data lines. SFU interface sfu_hcu_avail 1 In Indicatesvalid data present on sfu_hcu_sdata. sfu_hcu_sdata 1 In Bi-level dotdata. hcu_sfu_advdot 1 Out Informs the SFU that the HCU has captured thedot data on sfu_hcu_sdata and the SFU can now place the next dot on thedata line. TFU interface tfu_hcu_avail 1 In Indicates valid data presenton tfu_hcu_tdata. tfu_hcu_tdata 1 In Tag dot data. hcu_tfu_advdot 1 OutInforms the TFU that the HCU has captured the dot data on tfu_hcu_tdataand the TFU can now place the next dot on the data line. DNC interfacednc_hcu_ready 1 In Indicates that DNC is ready to accept data from theHCU. hcu_dnc_avail 1 Out Indicates valid data present on hcu_dnc_data.hcu_dnc_data[5:0] 6 Out Output bi-level dot data in 6 ink planes.

[3748] 28.4.2 Configuration Registers

[3749] The configuration registers in the HCU are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for the description ofthe protocol and timing diagrams for reading and writing registers inthe HCU. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for theHCU. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of hcu_pcu_datain. Theconfiguration registers of the HCU are listed in Table 193. TABLE 193HCU Registers Value Address Register on (HCU_base +) Name #bits ResetDescription Control registers 0x00 Reset 1 0x1 A write to this registercauses a reset of the HCU. 0x04 Go 1 0x0 Writing 1 to this registerstarts the HCU. Writing 0 to this register halts the HCU. When Go isasserted all counters, flags etc. are cleared or given their initialvalue, but configuration registers keep their values. When Go isdeasserted the state-machines go to their idle states but all countersand configuration registers keep their values. The HCU should be startedafter the CFU, SFU, TFU, and DNC. This register can be read to determineif the HCU is running (1 = running, 0 = stopped). Setup registers(constant for during processing) 0x10 AvailMask 4 0x0 Mask used todetermine which of the dotgen units etc. are to be checked before a dotis generated by the HCU within the specified margins for the specifiedcolor plane. If the specified dotgen unit is stalled, then the HCU willalso stall. See Table for bit allocation and definition. 0x14 TMMask 40x0 Same as AvailMask, but used in the top margin area before theappropriate target page is reached. 0x18 PageMarginY 32 0x0000_0000 Thefirst line considered to be off the page. 0x1C MaxDot 16 0x0000 This isthe maximum dot number − 1 present across a page. For example if a pagecontains 13824 dots, then MaxDot will be 13823. 0x20 TopMargin 320x0000_0000 The first line on a page to be considered within the targetpage for contone and spot data. (0 = first printed line of page) 0x24BottomMargin 32 0x0000_0000 The first line in the target bottom marginfor contone and spot data (i.e. first line after target page). 0x28LeftMargin 16 0x0000 The first dot on a line within the target page forcontone and spot data. 0x2C RightMargin 16 0xFFFF The first dot on aline within the target right margin for contone and spot data. 0x30TagTopMargin 32 0x0000_0000 The first line on a page to be consideredwithin the target page for tag data. 0 = first printed line of page)0x34 TagBottomMargin 32 0x0000_0000 The first line in the target bottommargin for tag data (i.e. first line after target page). 0x38TagLeftMargin 16 0x0000 The first dot on a line within the target pagefor tag data. 0x3C TagRightMargin 16 0xFFFF The first dot on a linewithin the target right margin for tag data. 0x44 StartDMAdr[21:5] 170x0_0000 Points to the first 256-bit word of the first line of thedither matrix in DRAM. 0x48 EndDMAdr[21:5] 17 0x0_0000 Points to thelast address of the group of four 256- bit reads (or 8 if singlebuffering) that reads in the last line of the dither matrix. 0x4CLineIncrement 5 0x2 The number of 256-bit words in DRAM from the startof one line of the dither matrix and the start of the next line, i.e.the value by which the DRAM address is incremented at the start of aline so that it points to the start of the next line of the dithermatrix. 0x50 DMInitIndexC0 8 0x00 If using the single-buffer scheme thisregister represents the initial index within 256-byte dither matrix linebuffer for contone plane 0. If using double-buffer scheme, only the 7lsbs are used. 0x54 DMLwrIndexC0 8 0x00 If using the single-bufferscheme this register represents the lower index within 256-byte dithermatrix line buffer for contone plane 0. If using double-buffer scheme,only the 7 lsbs are used. 0x58 DMUprIndexC0 8 0x3F If using thesingle-buffer scheme this register represents the upper index within256-byte dither matrix line buffer for contone plane 0. After readingthe data at this location the index wraps to DMLwrIndexC0. If usingdouble-buffer scheme, only the 7 lsbs are used. 0x5C DMInitIndexC1 80x00 If using the single-buffer scheme this register represents theinitial index within 256-byte dither matrix line buffer for contoneplane 1. If using double-buffer scheme, only the 7 lsbs are used. 0x60DMLwrIndexC1 8 0x00 If using the single-buffer scheme this registerrepresents the lower index within 256-byte dither matrix line buffer forcontone plane 1. If using double-buffer scheme, only the 7 lsbs areused. 0x64 DMUprIndexC1 8 0x3F If using the single-buffer scheme thisregister represents the upper index within 256-byte dither matrix linebuffer for contone plane 1. After reading the data at this location theindex wraps to DMLwrIndexC1. If using double-buffer scheme, only the 7lsbs are used. 0x68 DMInitIndexC2 8 0x00 If using the single-bufferscheme this register represents the initial index within 256-byte dithermatrix line buffer for contone plane 2. If using double-buffer scheme,only the 7 lsbs are used. 0x6C DMLwrIndexC2 8 0x00 If using thesingle-buffer scheme this register represents the lower index within256-byte dither matrix line buffer for contone plane 2. If usingdouble-buffer scheme, only the 7 lsbs are used. 0x70 DMUprIndexC2 8 0x3FIf using the single-buffer scheme this register represents the upperindex within 256-byte dither matrix line buffer for contone plane 2.After reading the data at this location the index wraps to DMLwrIndexC2.If using double-buffer scheme, only the 7 lsbs are used. 0x74DMInitIndexC3 8 0x00 If using the single-buffer scheme this registerrepresents the initial index within 256-byte dither matrix line bufferfor contone plane 3. If using double-buffer scheme, only the 7 lsbs areused. 0x78 DMLwrIndexC3 8 0x00 If using the single-buffer scheme thisregister represents the lower index within 256-byte dither matrix linebuffer for contone plane 3. If using double-buffer scheme, only the 7lsbs are used. 0x7C DMUprIndexC3 8 0x3F If using the single-bufferscheme this register represents the upper index within 256-byte dithermatrix line buffer for contone plane 3. After reading the data at thislocation the index wraps to DMLwrIndexC3. If using double-buffer scheme,only the 7 lsbs are used. 0x80 DoubleLineBuf 1 0x1 Selects the ditherline buffer mode to be single or double buffer. 0 - single line buffermode 1 - double line buffer mode 0x84 to 0x98 IOMappingLo 6 × 320x0000_0000 The dot reorg mapping for output inks 0 to 5. For each ink's64-bit IOMapping value, IOMappingLo represents the low order 32 bits.0x9C to 0xB0 IOMappingHi 6 × 32 0x0000_0000 The dot reorg mapping foroutput inks 0 to 5. For each ink's 64-bit IOMapping value, IOMappingHirepresents the high order 32 bits. 0xB4 to 0xC0 cpConstant 4 × 8 0x00The constant contone value to output for contone plane N when printingin the margin areas of the page. This value will typically be 0. 0xC4sConstant 1 0x0 The constant bi-level value to output for spot whenprinting in the margin areas of the page. This value will typically be0. 0xC8 tConstant 1 0x0 The constant bi-level value to output for tagdata when printing in the margin areas of the page. This value willtypically be 0. 0xCC DitherConstant 8 0xFF The constant value to use fordither matrix when the dither matrix is not available, i.e. when thesignal dm_avail is 0. This value will typically be 0xFF so thatcpConstant can easily be 0x00 or 0xFF without requiring a dither matrix(DitherConstant is primarily used for threshold dithering in the marginareas). Debug registers (read only) 0xD0 HcuPortsDebug 14 N/A Bit 13 =tfu_hcu_avail Bit 12 = hcu_tfu_advdot Bit 11 = sfu_hcu_avail Bit 10 =hcu_sfu_advdot Bit 9 = cfu_hcu_avail Bit 8 = hcu_cfu_advdot Bit 7 =dnc_hcu_ready Bit 6 = hcu_dnc_avail Bits 5-0 = hcu_dnc_data 0xD4HcuDotgenDebug 15 N/A Bit 14 = after_top_margin Bit 13 =in_tag_target_page Bit 12 = in_target_page Bit 11 = tp_avail Bit 10 =s_avail Bit 9 = cp_avail Bit 8 = dm_avail Bit 7 = advdot Bits 5-0 = [tp,s, cp3, cp2, cp1, cp0] (i.e. 6 bit input to dot reorg units) 0xD8HcuDitherDebug1 17 N/A Bit 17 = advdot Bit 16 = dm_avail Bit 15-8 =cp1_dither_val Bits 7-0 = cp0_dither_val 0xDC HcuDitherDebug2 17 N/A Bit17 = advdot Bit 16 = dm_avail Bit 15-8 = cp3_dither_val Bits 7-0 =cp2_dither_vall

[3750] 28.4.3 Control Unit

[3751] The control unit is responsible for controlling the overall flowof the HCU. It is responsible for determining whether or not a dot willbe generated in a given cycle, and what dot will actually begenerated—including whether or not the dot is in a margin area, and whatdither cell values should be used at the specific dot location. A blockdiagram of the control unit is shown in FIG. 232.

[3752] The inputs to the control unit are a number of avail flagsspecifying whether or not a given dotgen unit is capable of supplying‘real’ data in this cycle. The term ‘real’ refers to data generated fromexternal sources, such as contone line buffers, bi-level line buffers,and tag plane buffers. Each dotgen unit informs the control unit whetheror not a dot can be generated this cycle from real data. It must alsocheck that the DNC is ready to receive data.

[3753] The contone/spot margin unit is responsible for determiningwhether the current dot coordinate is within the target contone/spotmargins, and the tag margin unit is responsible for determining whetherthe current dot coordinate is within the target tag margins.

[3754] The dither matrix table interface provides the interface to DRAMfor the generation of dither cell values that are used in the halftoningprocess in the contone dotgen unit.

[3755] 28.4.3.1 Determine advdot

[3756] The HCU does not always require contone planes, bi-level or tagplanes in order to produce a page. For example, a given page may nothave a bi-level layer, or a tag layer. In addition, the contone andbi-level parts of a page are only required within the contone andbi-level page margins, and the tag part of a page is only requiredwithin the tag page margins. Thus output dots can be generated withoutcontone, bi-level or tag data before the respective top margins of apage has been reached, and 0s are generated for all color planes afterthe end of the page has been reached (to allow later stages of theprinting pipeline to flush).

[3757] Consequently the HCU has an AvailMask register that determineswhich of the various input avail flags should be taken notice of duringthe production of a page from the first line of the target page, and aTMMask register that has the same behaviour, but is used in the linesbefore the target page has been reached (i.e. inside the target topmargin area). The dither matrix mask bit TMask[0] is the exception, itapplies to all margins areas not just the top margin. Each bit in theAvailMask refers to a particular avail bit: if the bit in the AvailMaskregister is set, then the corresponding avail bit must be 1 for the HCUto advance a dot. The bit to avail correspondence is shown in Table 194.Care should be taken with TMMask—if the particular data is not availableafter the top margin has been reached, then the HCU will stall. Notethat the avail bits for contone and spot colors are ANDed within_target_page after the target page area has been reached to allow dotproduction in the contone/spot margin areas without needing any data inthe CFU and SFU. The avail bit for tag color is ANDed within_tag_target_page after the target tag page area has been reached toallow dot production in the tag margin areas without needing any data inthe TFU. TABLE 194 Correspondence between bit in AvailMask and availflag bit # in AvailMask avail flag description 0 dm_avail dither matrixdata available 1 cp_avail contone pixels available 2 s_avail spot coloravailable 3 tp_avail tag plane available

[3758] Each of the input avail bits is processed with its appropriatemask bit and the after_top_margin flag (note the dither matrix is theexception it is processed with in_target_page). The output bits areANDed together along with Go and output_buff_full (which specifieswhether the output buffer is ready to receive a dot in this cycle) toform the output bit advdot. We also generate wr_advdot. In this way, ifthe output buffer is full or any of the specified avail flags is clear,the HCU will stall. When the end of the page is reached, in_page will bedeasserted and the HCU will continue to produce 0 for all dots as longas the DNC requests data. A block diagram of the determine advdot unitis shown in FIG. 233.

[3759] The advance dot block also determines if current page needsdither matrix, it indicates to the dither matrix table interface blockvia the dm_read_enable signal. If no dither is required in the marginsor in the target page then dm_read_enable will be 0 and no dither willbe read in for this page.

[3760] 28.4.3.2 Position Unit

[3761] The position unit is responsible for outputting the position ofthe current dot (curr_pos, curr_line) and whether or not this dot is thelast dot of a line (advline). Both curr_pos and curr_line are set to 0at reset or when Go transitions from 0 to 1. The position unit relies onthe advdot input signal to advance through the dots on a page. Wheneveran advdot pulse is received, curr_pos gets incremented. If curr_posequals max_dot then an advline pulse is generated as this is the lastdot in a line, curr_line gets incremented, and the curr_pos is reset to0 to start counting the dots for the next line.

[3762] The position unit also generates a filtered version of advlinecalled dm_advline to indicate to the dither matrix pointers to incrementto the next line. The dm_advline is only incremented when dither isrequired for that line. if ((after_top_margin AND avail_mask[0]) ORtm_mask[0]) then dm_advline = advline else dm_advline = 0

[3763] 28.4.3.3 Margin Unit

[3764] The responsibility of the margin unit is to determine whether thespecific dot coordinate is within the page at all, within the targetpage or in a margin area (see FIG. 234). This unit is instantiated forboth the contone/spot margin unit and the tag margin unit.

[3765] The margin unit takes the current dot and line position, andreturns three flags.

[3766] the first, in_page is 1 if the current dot is within the page,and 0 if it is outside the page.

[3767] the second flag, in_target_page, is 1 if the dot coordinate iswithin the target page area of the page, and 0 if it is within thetarget top/left/bottom/right margins.

[3768] the third flag, after_top_margin, is 1 if the current dot isbelow the target top margin, and 0 if it is within the target topmargin.

[3769] A block diagram of the margin unit is shown in FIG. 235.

[3770] 28.4.3.4 Dither Matrix Table Interface

[3771] The dither matrix table interface provides the interface to DRAMfor the generation of dither cell values that are used in the halftoningprocess in the contone dotgen unit. The control flag dm_read_enableenables the reading of the dither matrix table line structure from DRAM.If dm_read_enable is 0, the dither matrix is not specified in DRAM andno DRAM accesses are attempted. The dither matrix table interface has anoutput flag dm_avail which specifies if the current line of thespecified matrix is available. The HCU can be directed to stall whendm_avail is 0 by setting the appropriate bit in the HCU's AvailMask orTMMask registers. When dm_avail is 0 the value in the DitherConstantregister is used as the dither cell values that are output to thecontone dotgen unit.

[3772] The dither matrix table interface consists of a state machinethat interfaces to the DRAM interface, a dither matrix buffer thatprovides dither matrix values, and a unit to generate the addresses forreading the buffer. FIG. 236 shows a block diagram of the dither matrixtable interface.

[3773] 28.4.3.5 Dither Data Structure in DRAM

[3774] The dither matrix is stored in DRAM in 256-bit words, transferredto the HCU in 64-bit words and consumed by the HCU in bytes. Table 195shows the 64-bit words mapping to 256-bit word addresses, and Table 196shows the 8-bits dither value mapping in the 64-bits word. TABLE 195Dither Data stored in DRAM Address[21:5] Data[255:0] 00000 D3 D2 D1 D0[255:192] [191:128] [127:64] [63:0] 00001 D7 D6 D5 D4 [255:192][191:128] [127:64] [63:0] 00010 D11 D10 D9 D8 [255:192] [191:128][127:64] [63:0] 00011 D15 D14 D13 D12 [255:192] [191:128] [127:64][63:0] 00100 D19 D18 D17 D16 [255:192] [191:128] [127:64] [63:0] etc

[3775] When the HCU first requests data from DRAM, the 64-bits wordtransfer order will be D0,D1,D2,D3. On the second request the transferorder will be D4,D5,D6,D7 and so on for other requests. TABLE 196 Ditherdata stored in HCUs line buffer Dither index [7:0] Data[7:0] 00 D0[7:0]01 D0[15:8] 02 D0[23:16] 03 D0[31:24] 04 D0[39:32] 05 D0[47:40] 06D0[55:48] 07 D0[63:56] 08 D1[7:0] 09 D1[15:8] 0A D1[23:16] 0B D1[31:24]0C D1[39:32] 0D D1[47:40] 0E D1[55:48] 0F D1[63:56] 10 D2[7:0] 11D2[15:8] 12 D2[23:16] 13 D2[32:24] 14 D2[39:32] 15 D2[47:40] 16D2[55:48] 17 D2[63:56] 18 D3[7:0] 19 D3[15:8] 1A D3[23:16] 1B D3[31:24]1C D3[39:32] 1D D3[47:40] 1E D3[55:48] 1F D3[63:56] 20 D4[7:0] 21D4[15:8] 22 D4[23:16] 23 D4[31:24] 24 D4[39:32] 25 D4[47:40] 26D4[55:48] 27 D4[63:56] 28 D5[7:0] 29 D5[15:8] 2A D5[23:16] 2B D5[31:24]2C D5[39:32] 2D D5[47:40] 2E D5[55:48] 2F D5[63:56] etc. etc.

[3776] 26.4.3.5.1 Dither Matrix Buffer

[3777] The state machine loads dither matrix table data a line at a timefrom DRAM and stores it in a buffer. A single line of the dither matrixis either 256 or 128 8-bit entries, depending on the programmable bitDoubleLineBuf. If this bit is enabled, a double-buffer mechanism isemployed such that while one buffer is read from for the current line'sdither matrix data (8 bits representing a single dither matrix entry),the other buffer is being written to with the next line's dither matrixdata (64-bits at a time). Alternatively, the single buffer scheme can beused, where the data must be loaded at the end of the line, thusincurring a delay.

[3778] The single/double buffer is implemented using a 256 byte 3-portregister array, two reads, one write port, with the reads clocked atdouble the system clock rate (320 MHz) allowing 4 reads per clock cycle.

[3779] The dither matrix buffer unit also provides the mechanism forkeeping track of the current read and write buffers, and providing themechanism such that a buffer cannot be read from until it has beenwritten to. In this case, each buffer is a line of the dither matrix,i.e. 256 or 128 bytes.

[3780] The dither matrix buffer maintains a read and write pointer forthe dither matrix. The output value dm_avail is derived by comparing theread and write pointers to determine when the dither matrix is notempty. The write pointer wr_adr is incremented each time a 64-bit wordis written to the dither matrix buffer and the read pointer rd_ptr isincremented each time dm_advline is received. If double_line_buf is 0the rd_ptr will increment by 2, otherwise it will increment by 1. If thedither matrix buffer is full then no further writes will be allowed(buff_full=1), or if the buffer is empty no further buffer reads areallowed (buff_emp=1).

[3781] The read addresses are byte aligned and are generated by the readaddress generator. A single dither matrix entry is represented by 8 bitsand an entry is read for each of the four contone planes in parallel. Ifdouble buffer is used (double_line_buf=1) the read address is derivedfrom 7-bit address from the read address generator and 1-bit from theread pointer. If double_line_buf=0 then the read address is the full8-bits from the read address generator. if (double_line_buf = = 1 )thenread_port[7:0] = {rd_ptr[0],rd_adr[6:0] } // concatenation elseread_port[7:0] = rd_adr[7:0]

[3782] 28.4.3.5.2 Read Address Generator

[3783] For each contone plane there is a initial, lower and upper indexto be used when reading dither cell values from the dither matrix doublebuffer. The read address for each plane is used to select a byte fromthe current 256-byte read buffer. When Go gets set (0 to 1 transition),or at the end of a line, the read addresses are set to theircorresponding initial index. Otherwise, the read address generatorrelies on advdot to advance the addresses within the inclusive rangespecified the lower and upper indices, represented by the followingpseudocode: if (advdot = = 1) then if (advline = = 1) then rd_adr =dm_init_index elsif (rd_adr = = dm_upr_index) then rd_adr = dm_lwr_indexelse rd_adr ++ else rd_adr = rd_adr

[3784] 28.4.3.5.3 State Machine

[3785] The dither matrix is read from DRAM in single 256-bit accesses,receiving the data from the DIU over 4 clock cycles (64-bits per cycle).The protocol and timing for read accesses to DRAM is described insection 20.9.1 on page 240. Read accesses to DRAM are implemented bymeans of the state machine described in FIG. 238.

[3786] All counters and flags should be cleared after reset or when Gotransitions from 0 to 1. While the Go bit is 1, the state machine relieson the dm_read_enable bit to tell it whether to attempt to read dithermatrix data from DRAM. When dm_read_enable is clear, the state machinedoes nothing and remains in the idle state. When dm_read_enable is set,the state machine continues to load dither matrix data, 256-bits at atime (received over 4 clock cycles, 64 bits per cycle), while there isspace available in the dither matrix buffer, (buff_full !=1).

[3787] The read address and line_start_adr are initially set tostart_dm_adr. The read address gets incremented after each read access.It takes 4 or 8 read accesses to load a line of dither matrix into thedither matrix buffer, depending on whether we're using a single ordouble buffer. A count is kept of the accesses to DRAM. When a readaccess completes and access_count equals 3 or 7, a line of dither matrixhas just been loaded from and the read address is updated toline_start_adr plus line_increment so it points to the start of the nextline of dither matrix. (line_start_adr is also updated to this value).If the read address equals end_dm_adr then the next read address will bestart_dm_adr, thus the read address wraps to point to the start of thearea in DRAM where the dither matrix is stored.

[3788] The write address for the dither matrix buffer is implemented bymeans of a modulo-32 counter that is initially set to 0 and incrementedwhen diu_hcu_rvalid is asserted.

[3789]FIG. 237 shows an example of setting start_dm_adr and end_dm_adrvalues in relation to the line increment and double line buffersettings. The calculation of end_dm_adr is // end_dm_adr calculationdm_height = Dither matrix height in lines if (double_line_buf = = 1) //end_dm_adr[21:5] = start_dm_adr[21:5] + (((dm_height − 1)*line_inc) + 3)<< 5) else end_dm_adr[21:5] = start_dm_adr[21:5] + (((dm_height −1)*line_inc) + 7) << 5)

[3790] 28.4.4 Contone Dotgen Unit

[3791] The contone dotgen unit is responsible for producing a dot in upto 4 color planes per cycle. The contone dotgen unit also produces acp_avail flag which specifies whether or not contone pixels arecurrently available, and the output hcu_cfu_advdot to request the CFU toprovide the next contone pixel in up to 4 color planes.

[3792] The block diagram for the contone dotgen unit is shown in FIG.239.

[3793] A dither unit provides the functionality for dithering a singlecontone plane. The contone image is only defined within the contone/spotmargin area. As a result, if the input flag in_target_page is 0, then aconstant contone pixel value is used for the pixel instead of thecontone plane.

[3794] The resultant contone pixel is then halftoned. The dither valueto be used in the halftoning process is provided by the control dataunit. The halftoning process involves a comparison between a pixel valueand its corresponding dither value. If the 8-bit contone value isgreater than or equal to the 8-bit dither matrix value a 1 is output. Ifnot, then a 0 is output. This means each entry in the dither matrix isin the range 1-255 (0 is not used).

[3795] Note that constant use is dependant on the in_target_page signalonly, if in_target_page is 1 then the cfu_hcu_c*_data should be allowedto pass through, regardless of the stalling behaviour or theavail_mask[1] setting. This allows a constant value to be setup on theCFU output data, and the use of different constants while inside andoutside the target page. The hcu_cfu_advdot will always be zero if theavail_mask[1] is zero.

[3796] 28.4.5 Spot Dotgen Unit

[3797] The spot dotgen unit is responsible for producing a dot ofbi-level data per cycle. It deals with bi-level data (and therefore doesnot need to halftone) that comes from the LBD via the SFU. Like thecontone layer, the bi-level spot layer is only defined within thecontone/spot margin area. As a result, if input flag in_target_page is0, then a constant dot value (typically this would be 0) is used for theoutput dot.

[3798] The spot dotgen unit also produces a s_avail flag which specifieswhether or not spot dots are currently available for this spot plane,and the output hcu_sfu_advdot to request the SFU to provide the nextbi-level data value. The spot dotgen unit can be represented by thefollowing pseudocode: s_avail = sfu_hcu_avail if (in_target_page = = 1AND avail_mask[2] = = 0 )OR (in_target_page = = 0) then hcu_sfu_advdot =0 else hcu_sfu_advdot = advdot if (in_target_page = = 1) then sp =sfu_hcu_sdata else sp = sp_constant

[3799] Note that constant use is dependant on the in target_page signalonly, if in_target_page is 1 then the sfu_hcu_data should be allowed topass through, regardless of the stalling behaviour or the avail_masksetting. This allows a constant value to be setup on the SFU outputdata, and the use of different constants while inside and outside thetarget page. The hcu_sfu_advdot will always be zero if the avail_mask[2]is zero.

[3800] 28.4.6 Tag Dotgen Unit

[3801] This unit is very similar to the spot dotgen unit (see Section28.4.5) in that it deals with bi-level data, in this case from the TEvia the TFU. The tag layer is only defined within the tag margin area.As a result, if input flag in_tag_target_page is 0, then a constant dotvalue, tp_constant (typically this would be 0), is used for the outputdot. The tagplane dotgen unit also produces a tp_avail flag whichspecifies whether or not tag dots are currently available for thetagplane, and the output hcu_tfu_advdot to request the TFU to providethe next bi-level data value.

[3802] The hcu_tfu_advdot generation is similar to the SFU and CFU,except it depends only on in_target_page and advdot. It does not takeinto account the avail mask when inside the target page.

[3803] 28.4.7 Dot Reorg Unit

[3804] The dot reorg unit provides a means of mapping the bi-leveldithered data, the spot0 color, and the tag data to output inks in theactual printhead. Each dot reorg unit takes a set of 6 1-bit inputs andproduces a single bit output that represents the output dot for thatcolor plane.

[3805] The output bit is a logical combination of any or all of theinput bits. This allows the spot color to be placed in any output colorplane (including infrared for testing purposes), black to be merged intocyan, magenta and yellow (in the case of no black ink in the Memjetprinthead), and tag dot data to be placed in a visible plane. An outputfor fixative can readily be generated by simply combining desired inputbits.

[3806] The dot reorg unit contains a 64-bit lookup to allow completefreedom with regards to mapping. Since all possible combinations ofinput bits are accounted for in the 64 bit lookup, a given dot reorgunit can take the mapping of other reorg units into account. Forexample, a black plane reorg unit may produce a 1 only if the contoneplane 3 or spot color inputs are set (this effectively composites blackbi-level over the contone). A fixative reorg unit may generate a 1 ifany 2 of the output color planes is set (taking into account themappings produced by the other reorg units). If dead nozzle replacementis to be used (see section 29.4.2 on page 473), the dot reorg can beprogrammed to direct the dots of the specified color into the mainplane, and 0 into the other. If a nozzle is then marked as dead in theDNC, swapping the bits between the planes will result in 0 in the deadnozzle, and the required data in the other plane.

[3807] If dead nozzle replacement is to be used, and there are no tags,the TE can be programmed with the position of dead nozzles and theresultant pattern used to direct dots into the specified nozzle row. Ifonly fixed background TFS is to be used, a limited number of nozzles canbe replaced. If variable tag data is to be used to specify dead nozzles,then large numbers of dead nozzles can be readily compensated for.

[3808] The dot reorg unit can be used to average out the nozzle usagewhen two rows of nozzles share the same ink and tag encoding is notbeing used. The TE can be programmed to produce a regular pattern (e.g.0101 on one line, and 1010 on the next) and this pattern can be used asa directive as to direct dots into the specified nozzle row.

[3809] Each reorg unit contains a 64-bit IOMapping value programmable astwo 32-bit HCU registers, and a set of selection logic based on the6-bit dot input (2⁶=64 bits), as shown in FIG. 240. The mapping of inputbits to each of the 6 selection bits is as defined in Table 197. TABLE197 Mapping of input bits to 6 selection bits address bit of likelylookup tied to interpretation 0 bi-level dot from contone layer 0 cyan 1bi-level dot from contone layer 1 magenta 2 bi-level dot from contonelayer 2 yellow 3 bi-level dot from contone layer 3 black 4 bi-levelspot0 dot black 5 bi-level tag dot infra-red

[3810] 28.4.8 Output Buffer

[3811] The output buffer de-couples the stalling behaviour of the feederunits from the stalling behaviour of the DNC. The larger the buffer thegreater de-coupling. Currently the output buffer size is 2, but could beincreased if needed at the cost of extra area.

[3812] If the Go bit is set to 0 no read or write of the output bufferis permitted. On a low to high transition of the Go bit the contents ofthe output buffer are cleared.

[3813] The output buffer also implements the interface logic to the DNC.If there is data in the output buffer the hcu_dnc_avail signal will be1, otherwise is will be 0. If both hcu_dnc_avail and dnc_hcu_ready are 1then data is read from the output buffer.

[3814] On the write side if there is space available in the outputbuffer the logic indicates to the control unit via the output_buff_fullsignal. The control unit will then allow writes to the output buffer viathe wr_advdot signal. If the writes to the output buffer are after theend of a page (indicated by in_page equal to 0) then all dots writteninto the output buffer are set to zero.

[3815] 28.4.8.1 HCU to DNC Interface

[3816]FIG. 241 shows the timing diagram and representative logic of theHCU to DNC interface. The hcu_dnc_avail signal indicate to the DNC thatthe HCU has data available. The dnc_hcu_ready signal indicates to theHCU that the DNC is ready to accept data. When both signals are highdata is transferred from the HCU to the DNC. Once the HCU indicates ithas data available (setting the hcu_dnc_avail signal high) it can onlyset the hcu_dnc_avail low again after a dot is accepted by the DNC.

[3817] 28.4.9 Feeder to HCU Interfaces

[3818]FIG. 242 shows the feeder unit to HCU interface timing diagram,and FIG. 243 shows representative logic of the interface with theregister positions. sfu_hcu_data and sfu_hcu_avail are always registeredwhile the sfu_hcu_advdot is not. The hcu_sfu_avail signal indicates tothe HCU that the feeder unit has data available, and sfu_hcu_advdotindicates to the feeder unit that the HCU has captured the last dot. TheHCU can never produce an advance dot pulse while the avail is low. Thediagrams show the example of the SFU to HCU interface, but the sameinterface is used for the other feeder units TFU and CFU.

[3819] 29 Dead Nozzle Compensator (DNC)

[3820] 29.1 Overview

[3821] The Dead Nozzle Compensator (DNC) is responsible for adjustingMemjet dot data to take account of non-functioning nozzles in the Memjetprinthead. Input dot data is supplied from the HCU, and the correcteddot data is passed out to the DWU. The high level data path is shown bythe block diagram in FIG. 244.

[3822] The DNC compensates for a dead nozzles by performing thefollowing operations:

[3823] Dead nozzle removal, i.e. turn the nozzle off

[3824] Ink replacement by direct substitution i.e. K->K

[3825] Ink replacement by indirect substitution i.e. K->CMY

[3826] Error diffusion to adjacent nozzles

[3827] Fixative corrections

[3828] The DNC is required to efficiently support up to 5% dead nozzles,under the expected DRAM bandwidth allocation, with no restriction onwhere dead nozzles are located and handle any fixative correction due tonozzle compensations. Performance must degrade gracefully after 5% deadnozzles.

[3829] 29.2 Dead Nozzle Identification

[3830] Dead nozzles are identified by means of a position value and amask value. Position information is represented by a 10-bit deltaencoded format, where the 10-bit value defines the number of dotsbetween dead nozzle columns¹⁹. With the delta information it also readsthe 6-bit dead nozzle mask (dn_mask) for the defined dead nozzleposition. Each bit in the dn_mask corresponds to an ink plane. A set bitindicates that the nozzle for the corresponding ink plane is dead. Thedead nozzle table format is shown in FIG. 245. The DNC reads dead nozzleinformation from DRAM in single 256-bit accesses. A 10-bit deltaencoding scheme is chosen so that each table entry is 16 bits wide, and16 entries fit exactly in each 256-bit read. Using 10-bit delta encodingmeans that the maximum distance between dead nozzle columns is 1023dots. It is possible that dead nozzles may be spaced further than 1023dots from each other, so a null dead nozzle identifier is required. Anull dead nozzle identifier is defined as a 6-bit dn_mask of all zeros.These null dead nozzle identifiers should also be used so that:

[3831] the dead nozzle table is a multiple of 16 entries (so that it isaligned to the 256-bit DRAM locations)

[3832] the dead nozzle table spans the complete length of the line, i.e.the first entry dead nozzle table should have a delta from the firstnozzle column in a line and the last entry in the dead nozzle tableshould correspond to the last nozzle column in a line.

[3833] Note that the DNC deals with the width of a page. This may or maynot be the same as the width of the printhead (the PHI may introducesome margining to the page so that its dot output matches the width ofthe printhead). Care must be taken when programming the dead nozzletable so that dead nozzle positions are correctly specified with respectto the page and printhead.

[3834] 29.3 Dram Storage and Bandwidth Requirement

[3835] The memory required is largely a factor of the number of deadnozzles present in the printhead (which in turn is a factor of theprinthead size). The DNC is required to read a 16-bit entry from thedead nozzle table for every dead nozzle. Table 198 shows the DRAMstorage and average²⁰ bandwidth requirements for the DNC for differentpercentages of dead nozzles and different page sizes. TABLE 198 DeadNozzle storage and average bandwidth requirements Dead nozzle table %Dead Memory Bandwidth Page size Nozzles (KBytes) (bits/cycle) A4^(a)  5%  1.4^(c)  0.8^(d) 10% 2.7 1.6 15% 4.1 2.4 A3^(b)  5% 1.9 0.8 10% 3.81.6 15% 5.7 2.4

[3836] 29.4 Nozzle Compensation

[3837] DNC receives 6 bits of dot information every cycle from the HCU,1 bit per color plane. When the dot position corresponds to a deadnozzle column, the associated 6-bit dn_mask indicates which ink plane(s)contains a dead nozzle(s). The DNC first deletes dots destined for thedead nozzle. It then replaces those dead dots, either by placing thedata destined for the dead nozzle into an adjacent ink plane (directsubstitution) or into a number of ink planes (indirect substitution).After ink replacement, if a dead nozzle is made active again then theDNC performs error diffusion. Finally, following the dead nozzlecompensation mechanisms the fixative, if present, may need to beadjusted due to new nozzles being activated, or dead nozzles beingremoved.

[3838] 29.4.1 Dead Nozzle Removal

[3839] If a nozzle is defined as dead, then the first action for the DNCis to turn off (zeroing) the dot data destined for that nozzle. This isdone by a bit-wise ANDing of the inverse of the dn_mask with the dotvalue.

[3840] 29.4.2 Ink Replacement

[3841] Ink replacement is a mechanism where data destined for the deadnozzle is placed into an adjacent ink plane of the same color (directsubstitution, i.e. K->K_(alternative)), or placed into a number of inkplanes, the combination of which produces the desired color (indirectsubstitution, i.e. K->CMY). Ink replacement is performed by filteringout ink belonging to nozzles that are dead and then adding back in anappropriately calculated pattern. This two step process allows theoptional re-inclusion of the ink data into the original dead nozzleposition to be subsequently error diffused. In the general case,fixative data destined for a dead nozzle should not be left activeintending it to be later diffused.

[3842] The ink replacement mechanism has 6 ink replacement patterns, oneper ink plane, programmable by the CPU. The dead nozzle mask is ANDedwith the dot data to see if there are any planes where the dot is activebut the corresponding nozzle is dead. The resultant value forms anenable, on a per ink basis, for the ink replacement process. Ifreplacement is enabled for a particular ink, the values from thecorresponding replacement pattern register are ORed into the dot data.The output of the ink replacement process is then filtered so that errordiffusion is only allowed for the planes in which error diffusion isenabled. The output of the ink replacement logic is ORed with theresultant dot after dead nozzle removal. See Figure n page 565 on pageError! Bookmark not defined. for implementation details.

[3843] For example if we consider the printhead color configurationC,M,Y,K₁,K₂,IR and the input dot data from the HCU is b101100. Assumingthat the K₁ ink plane and IR ink plane for this position are dead so thedead nozzle mask is b000101. The DNC first removes the dead nozzle byzeroing the K₁ plane to produce b101000. Then the dead nozzle mask isANDed with the dot data to give b000100 which selects the inkreplacement pattern for K₁ (in this case the ink replacement pattern forK₁ is configured as b000010, i.e. ink replacement into the K₂ plane).Providing error diffusion for K₂ is enabled, the output from the inkreplacement process is b000010. This is ORed with the output of deadnozzle removal to produce the resultant dot b101010. As can be seen thedot data in the defective K₁ nozzle was removed and replaced by a dot inthe adjacent K₂ nozzle in the same dot position, i.e. directsubstitution.

[3844] In the example above the K₁ ink plane could be compensated for byindirect substitution, in which case ink replacement pattern for K₁would be configured as b111000 (substitution into the CMY color planes),and this is ORed with the output of dead nozzle removal to produce theresultant dot b111000. Here the dot data in the defective K, ink planewas removed and placed into the CMY ink planes.

[3845] 29.4.3 Error Diffusion

[3846] Based on the programming of the lookup table the dead nozzle maybe left active after ink replacement. In such cases the DNC cancompensate using error diffusion. Error diffusion is a mechanism wheredead nozzle dot data is diffused to adjacent dots.

[3847] When a dot is active and its destined nozzle is dead, the DNCwill attempt to place the data into an adjacent dot position, if one isinactive. If both dots are inactive then the choice is arbitrary, and isdetermined by a pseudo random bit generator. If both neighbor dots arealready active then the bit cannot be compensated by diffusion.

[3848] Since the DNC needs to look at neighboring dots to determinewhere to place the new bit (if required), the DNC works on a set of 3dots at a time. For any given set of 3 dots, the first dot received fromthe HCU is referred to as dot A, and the second as dot B, and the thirdas dot C. The relationship is shown in FIG. 246.

[3849] For any given set of dots ABC, only B can be compensated for byerror diffusion if B is defined as dead. A 1 in dot B will be diffusedinto either dot A or dot C if possible. If there is already a 1 in dot Aor dot C then a 1 in dot B cannot be diffused into that dot.

[3850] The DNC must support adjacent dead nozzles. Thus if dot A isdefined as dead and has previously been compensated for by errordiffusion, then the dot data from dot B should not be diffused into dotA. Similarly, if dot C is defined as dead, then dot data from dot Bshould not be diffused into dot C.

[3851] Error diffusion should not cross line boundaries. If dot Bcontains a dead nozzle and is the first dot in a line then dot Arepresents the last dot from the previous line. In this case an activebit on a dead nozzle of dot B should not be diffused into dot A.Similarly, if dot B contains a dead nozzle and is the last dot in a linethen dot C represents the first dot of the next line. In this case anactive bit on a dead nozzle of dot B should not be diffused into dot C.

[3852] Thus, as a rule, a 1 in dot B cannot be diffused into dot A if

[3853] a 1 is already present in dot A,

[3854] dot A is defined as dead,

[3855] or dot A is the last dot in a line.

[3856] Similarly, a 1 in dot B cannot be diffused into dot C if

[3857] a 1 is already present in dot C,

[3858] dot C is defined as dead,

[3859] or dot C is the first dot in a line.

[3860] If B is defined to be dead and the dot value for B is 0, then nocompensation needs to be done and dots A and C do not need to bechanged.

[3861] If B is defined to be dead and the dot value for B is 1, then Bis changed to 0 and the DNC attempts to place the 1 from B into either Aor C:

[3862] If the dot can be placed into both A and C, then the DNC mustchoose between them. The preference is given by the current output fromthe random bit generator, 0 for “prefer left” (dot A) or 1 for “preferright” (dot C).

[3863] If dot can be placed into only one of A and C, then the 1 from Bis placed into that position.

[3864] If dot cannot be placed into either one of A or C, then the DNCcannot place the dot in either position. TABLE 199 Error-Diffusion TruthTable when dot B is dead Input A OR C OR A dead C dead OR OR A last Cfirst Output in line B in line Rand{grave over ( )}a A B C 0 0 0 X Ainput 0 C input 0 0 1 X A input 0 C input 0 1 0 0 1′b 0 C input 0 1 0 1A input 0 1 0 1 1 X 1 0 C input 1 0 0 X A input 0 C input 1 0 1 X Ainput 0 C input 1 1 0 X A input 0 1 1 1 1 X A input 0 C input

[3865] Table 199 shows the truth table for DNC error diffusion operationwhen dot B is defined as dead.

[3866] a. Output from random bit generator. Determines direction oferror diffusion (0=left, 1=right)

[3867] b. Bold emphasis is used to show the DNC inserted a 1

[3868] The random bit value used to arbitrarily select the direction ofdiffusion is generated by a 32-bit maximum length random bit generator.The generator generates a new bit for each dot in a line regardless ofwhether the dot is dead or not. The random bit generator can beinitialized with a 32-bit programmable seed value.

[3869] 29.4.4 Fixative Correction

[3870] After the dead nozzle compensation methods have been applied tothe dot data, the fixative, if present, may need to be adjusted due tonew nozzles being activated, or dead nozzles being removed. For eachoutput dot the DNC determines if fixative is required (using theFixativeRequiredMask register) for the new compensated dot data word andwhether fixative is activated already for that dot. For the DNC to do soit needs to know the color plane that has fixative, this is specified bythe FixativeMask1 configuration register. Table 200 indicates theactions to take based on these calculations. TABLE 200 Truth table forfixative correction Fixative Fixative Present required Action 1 1 Outputdot as is. 1 0 Clear fixative plane. 0 1 Attempt to add fixative. 0 0Output dot as is.

[3871] The DNC also allows the specification of another fixative plane,specified by the FixativeMask2 configuration register, withFixativeMask1 having the higher priority over FixativeMask2. Whenattempting to add fixative the DNC first tries to add it into the planesdefined by FixativeMask1. However, if any of these planes is dead thenit tries to add fixative by placing it into the planes defined byFixativeMask2.

[3872] Note that the fixative defined by FixativeMask1 and FixativeMask2could possibly be multi-part fixative, i.e. 2 bits could be set inFixativeMask1 with the fixative being a combination of both inks.

[3873] 29.5 Implementation

[3874] A block diagram of the DNC is shown in FIG. 247.

[3875] 29.5.1 Definitions of I/O TABLE 201 DNC port list and descriptionPort name Pins I/O Description Clocks and Resets Pclk 1 In System Clock.prst_n 1 In System reset, synchronous active low. PCU interfacepcu_dnc_sel 1 In Block select from the PCU. When pcu_dnc_sel is highboth pcu_adr and pcu_dataout are valid. pcu_rwn 1 In Commonread/not-write signal from the PCU. pcu_adr[6:2] 5 In PCU address bus.Only 5 bits are required to decode the address space for this block.pcu_dataout[31:0] 32 In Shared write data bus from the PCU. dnc_pcu_rdy1 Out Ready signal to the PCU. When dnc_pcu_rdy is high it indicates thelast cycle of the access. For a write cycle this means pcu_dataout hasbeen registered by the block and for a read cycle this means the data ondnc_pcu_datain is valid. dnc_pcu_datain[31:0] 32 Out Read data bus tothe PCU. DIU interface dnc_diu_rreq 1 Out DNC unit requests DRAM read. Aread request must be accompanied by a valid read address.dnc_diu_radr[21:5] 17 Out Read address to DIU, 256-bit word aligned.diu_dnc_rack 1 In Acknowledge from DIU that read request has beenaccepted and new read address can be placed on dnc_diu_radrdiu_dnc_rvalid 1 In Read data valid, active high. Indicates that validread data is now on the read data bus, diu_data. diu_data[63:0] 64 InRead data from DIU. HCU interface dnc_hcu_ready 1 Out Indicates that DNCis ready to accept data from the HCU. hcu_dnc_avail 1 In Indicates validdata present on hcu_dnc_data. hcu_dnc_data[5:0] 6 In Output bi-level dotdata in 6 ink planes. DWU interface dwu_dnc_ready 1 In Indicates thatDWU is ready to accept data from the DNC. dnc_dwu_avail 1 Out Indicatesvalid data present on dnc_dwu_data. dnc_dwu_data[5:0] 6 Out Outputbi-level dot data in 6 ink planes.

[3876] 29.5.2 Configuration Registers

[3877] The configuration registers in the DNC are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for the description ofthe protocol and timing diagrams for reading and writing registers inthe DNC. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for theDNC. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of dnc_pcu_datain. Table 202lists the configuration registers in the DNC. TABLE 202 DNCconfiguration registers Value Address Register on (DNC_base +) name#bits reset Description Control registers 0x00 Reset 1 0x1 A write tothis register causes a reset of the DNC. 0x04 Go 1 0x0 Writing 1 to thisregister starts the DNC. Writing 0 to this register halts the DNC. WhenGo is asserted all counters, flags etc. are cleared or given theirinitial value, but configuration registers keep their values. When Go isdeasserted the state- machines go to their idle states but all countersand configuration registers keep their values. This register can be readto determine if the DNC is running (1 = running, 0 = stopped). Setupregisters (constant during processing) 0x10 MaxDot 16 0x0000 This is themaximum dot number − 1 present across a page. For example if a pagecontains 13824 dots, then MaxDot will be 13823. [Note that this numbermay or may not be the same as the number of dots across the printhead assome margining may be introduced in the PHI. 0x14 LSFR 32 0x0000_0000The current value of the LFSR register used as the 32-bit maximum lengthrandom bit generator. Users can write to this register to program a seedvalue for the 32-bit maximum length random bit generator. Must not beall 1s for taps implemented in XNOR form. (It is expected that writing aseed value will not occur during he operation of the LFSR). This LSFRvalue could also have a possible use as a random source in program code.0x20 FixativeMask1 6 0x00 Defines the higher priority fixative plane(s).Bit 0 represents the settings for plane 0, bit 1 for plane 1 etc. Foreach bit: 1 = the ink plane contains fixative. 0 = the ink plane doesnot contain fixative. 0x24 FixativeMask2 6 0x00 Defines the lowerpriority fixative plane(s). Bit 0 represents the settings for plane 0,bit 1 for plane 1 etc. Used only when FixativeMask1 planes are dead. Foreach bit: 1 = the ink plane contains fixative. 0 = the ink plane doesnot contain fixative. 0x28 FixativeRequired 6 0x00 Identifies the inkplanes that require Mask fixative. Bit 0 represents the settings forplane 0, bit 1 for plane 1 etc. For each bit: 1 = the ink plane requiresfixative. 0 = the ink plane does not require fixative (e.g. ink isself-fixing) 0x30 DnTableStartAdr[21:5] 17 0x0_0000 Start address ofDead Nozzle Table in DRAM, specified in 256-bit words. 0x34DnTableEndAdr[21:5] 17 0x0_0000 End address of Dead Nozzle Table inDRAM, specified in 256-bit words, i.e. he location containing the lastentry in the Dead Nozzle Table. The Dead Nozzle Table should be alignedto a 256-bit boundary, if necessary it can be padded with null entries.0x40-0x54 PlaneReplacePattern[5:0] 6 × 6 0x00 Defines the inkreplacement pattern for each of the 6 ink planes. PlaneReplacePattern[0]is the ink replacement pattern for plane 0, PlaneReplacePattern[1] isthe ink replacement pattern for plane 1, etc. For each 6-bit replacementpattern for a plane, a 1 in any bit positions indicates he alternativeink planes to be used for this plane. 0x58 DiffuseEnable 6 0x3F Defineswhether, after ink replacement, error diffusion is allowed to beperformed on each plane. Bit 0 represents the settings for plane 0, bit1 for plane 1 etc. For each bit: 1 = error diffusion is enabled 0 =error diffusion is disabled Debug registers (read only) 0x60DncOutputDebug 8 N/A Bit 7 = dwu_dnc_ready Bit 6 = dnc_dwu_avail Bits5-0 = dnc_dwu_data 0x64 DncReplaceDebug 14 N/A Bit 13 = edu_ready Bit 12= iru_avail Bits 11-6 = iru_dn_mask Bits 5-0 = iru_data 0x68DncDiffuseDebug 14 N/A Bit 13 = dwu_dnc_ready Bit 12 = dnc_dwu_availBits 11-6 = edu_dn_mask Bits 5-0 = edu_data

[3878] 29.5.3 Ink Replacement Unit

[3879]FIG. 248 shows a sub-block diagram for the ink replacement unit.

[3880] 29.5.3.1 Control Unit

[3881] The control unit is responsible for reading the dead nozzle tablefrom DRAM and making it available to the DNC via the dead nozzle FIFO.The dead nozzle table is read from DRAM in single 256-bit accesses,receiving the data from the DIU over 4 clock cycles (64-bits per cycle).The protocol and timing for read accesses to DRAM is described insection 20.9.1 on page 240. Reading from DRAM is implemented by means ofthe state machine shown in FIG. 249. All counters and flags should becleared after reset. When Go transitions from 0 to 1 all counters andflags should take their initial value. While the Go bit is 1, the statemachine requests a read access from the dead nozzle table in DRAMprovided there is enough space in its FIFO.

[3882] A modulo-4 counter, rd_count, is used to count each of the64-bits received in a 256-bit read access. It is incremented wheneverdiu_dnc_rvalid is asserted. When Go is 1, dn_table_radr is set todn_table_start_adr. As each 64-bit value is returned, indicated bydiu_dnc_rvalid being asserted, dn_table_radr is compared todn_table_end_adr.

[3883] If rd_count equals 3 and dn_table_radr equals dn_table_end_adr,then dn_table_radr is updated to dn_table_start_adr.

[3884] If rd_count equals 3 and dn_table_radr does not equaldn_table_end_adr, then dn_table_radr is incremented by 1.

[3885] A count is kept of the number of 64-bit values in the FIFO. Whendiu_dnc_rvalid is 1 data is written to the FIFO by asserting wr_en, andfifo_contents and fifo_wr_adr are both incremented.

[3886] When fifo_contents[3:0] is greater than 0 and edu_ready is 1,dnc_hcu_ready is asserted to indicate that the DNC is ready to acceptdots from the HCU. If hcu_dnc_avail is also 1 then a dotadv pulse issent to the GenMask unit, indicating the DNC has accepted a dot from theHCU, and iru_avail is also asserted. After Go is set, a single preloadpulse is sent to the GenMask unit once the FIFO contains data.

[3887] When a rd_adv pulse is received from the GenMask unit,fifo_rd_adr[4:0] is then incremented to select the next 16-bit value. Iffifo_rd_adr[1:0]=11 then the next 64-bit value is read from the FIFO byasserting rd_en, and fifo_contents[3:0] is decremented.

[3888] 29.5.3.2 Dead Nozzle FIFO

[3889] The dead nozzle FIFO conceptually is a 64-bit input, and 16-bitoutput FIFO to account for the 64-bit data transfers from the DIU, andthe individual 16-bit entries in the dead nozzle table that are used inthe GenMask unit. In reality, the FIFO is actually 8 entries deep and64-bits wide (to accommodate two 256-bit accesses).

[3890] On the DRAM side of the FIFO the write address is 64-bit alignedwhile on the GenMask side the read address is 16-bit aligned, i.e. theupper 3 bits are input as the read address for the FIFO and the lower 2bits are used to select 16 bits from the 64 bits (1st 16 bits readcorresponds to bits 15-0, second 16 bits to bits 31-16 etc.).

[3891] 29.5.3.3 GenMask Unit

[3892] The GenMask unit generates the 6-bit dn_mask that is sent to thereplace unit. It consists of a 10-bit delta counter and a mask register.

[3893] After Go is set, the GenMask unit will receive a preload pulsefrom the control unit indicating the first dead nozzle table entry isavailable at the output of the dead nozzle FIFO and should be loadedinto the delta counter and mask register. A rd_adv pulse is generated sothat the next dead nozzle table entry is presented at the output of thedead nozzle FIFO. The delta counter is decremented every time a dotadvpulse is received. When the delta counter reaches 0, it gets loaded withthe current delta value output from the dead nozzle FIFO, i.e. bits15-6, and the mask register gets loaded with mask output from the deadnozzle FIFO, i.e. bits 5-0. A rd_adv pulse is then generated so that thenext dead nozzle table entry is presented at the output of the deadnozzle FIFO.

[3894] When the delta counter is 0 the value in the mask register isoutput as the dn_mask, otherwise the dn_mask is all 0s.

[3895] The GenMask unit has no knowledge of the number of dots in aline, it simply loads a counter to count the delta from one dead nozzlecolumn to the next. Thus as described in section 29.2 on page 472 thedead nozzle table should include null identifiers if necessary so thatthe dead nozzle table covers the first and last nozzle column in a line.

[3896] 29.5.3.4 Replace Unit

[3897] Dead nozzle removal and ink replacement are implemented by thecombinatorial logic shown in FIG. 250. Dead nozzle removal is performedby bit-wise ANDing of the inverse of the dn_mask with the dot value.

[3898] The ink replacement mechanism has 6 ink replacement patterns, oneper ink plane, programmable by the CPU. The dead nozzle mask is ANDedwith the dot data to see if there are any planes where the dot is activebut the corresponding nozzle is dead. The resultant value forms anenable, on a per ink basis, for the ink replacement process. Ifreplacement is enabled for a particular ink, the values from thecorresponding replacement pattern register are ORed into the dot data.The output of the ink replacement process is then filtered so that errordiffusion is only allowed for the planes in which error diffusion isenabled.

[3899] The output of the ink replacement process is ORed with theresultant dot after dead nozzle removal. If the dot position does notcontain a dead nozzle then the dn_mask will be all 0s and the dot,hcu_dnc_data, will be passed through unchanged.

[3900] 29.5.4 Error Diffusion Unit

[3901]FIG. 251 shows a sub-block diagram for the error diffusion unit.

[3902] 29.5.4.1 Random Bit Generator

[3903] The random bit value used to arbitrarily select the direction ofdiffusion is generated by a maximum length 32-bit LFSR. The tap pointsand feedback generation are shown in FIG. 252. The LFSR generates a newbit for each dot in a line regardless of whether the dot is dead or not,i.e shifting of the LFSR is enabled when advdot equals 1. The LFSR canbe initialised with a 32-bit programmable seed value, random_seed. Thisseed value is loaded into the LFSR whenever a write occurs to theRandomSeed register. Note that the seed value must not be all 1 s asthis causes the LFSR to lock-up.

[3904] 29.5.4.2 Advance Dot Unit

[3905] The advance dot unit is responsible for determining in a givencycle whether or not the error diffuse unit will accept a dot from theink replacement unit or make a dot available to the fixative correctunit and on to the DWU. It therefore receives the dwu_dnc_ready controlsignal from the DWU, the iru_avail flag from the ink replacement unit,and generates dnc_dwu_avail and edu_ready control flags.

[3906] Only the dwu_dnc_ready signal needs to be checked to see if a dotcan be accepted and asserts edu_ready to indicate this. If the errordiffuse unit is ready to accept a dot and the ink replacement unit has adot available, then a advdot pulse is given to shift the dot into thepipeline in the diffuse unit. Note that since the error diffusionoperates on 3 dots, the advance dot unit ignores dwu_dnc_ready initiallyuntil 3 dots have been accepted by the diffuse unit. Similarlydnc_dwu_avail is not asserted until the diffuse unit contains 3 dots andthe ink replacement unit has a dot available.

[3907] 29.5.4.3 Diffuse Unit

[3908] The diffuse unit contains the combinatorial logic to implementthe truth table from Table. The diffuse unit receives a dot consistingof 6 color planes (1 bit per plane) as well as an associated 6-bit deadnozzle mask value.

[3909] Error diffusion is applied to all 6 planes of the dot inparallel. Since error diffusion operates on 3 dots, the diffuse unit hasa pipeline of 3 dots and their corresponding dead nozzle mask values.

[3910] The first dot received is referred to as dot A, and the second asdot B, and the third as dot C. Dots are shifted along the pipelinewhenever advdot is 1. A count is also kept of the number of dotsreceived. It is incremented whenever advdot is 1, and wraps to 0 when itreaches max_dot. When the dot count is 0 dot C corresponds to the firstdot in a line. When the dot count is 1 dot A corresponds to the last dotin a line.

[3911] In any given set of 3 dots only dot B can be defined ascontaining a dead nozzle(s). Dead nozzles are identified by bits set iniru_dn_mask. If dot B contains a dead nozzle(s), the correspondingbit(s) in dot A, dot C, the dead nozzle mask value for A, the deadnozzle mask value for C, the dot count, as well as the random bit valueare input to the truth table logic and the dots A, B and C assignedaccordingly. If dot B does not contain a dead nozzle then the dots areshifted along the pipeline unchanged.

[3912] 29.5.5 Fixative Correction Unit

[3913] The fixative correction unit consists of combinatorial logic toimplement fixative correction as defined in Table 203. For each outputdot the DNC determines if fixative is required for the new compensateddot data word and whether fixative is activated already for that dot.FixativePresent = ((FixativeMask1 | FixativeMask2) & edu_data) != 0FixativeRequired = (FixativeRequiredMask & edu_data) != 0

[3914] It then looks up the truth table to see what action, if any,needs to be taken. TABLE 203 Truth table for fixative correctionFixative Fixative Present required Action Output 1 1 Output dot as is.dnc_dwu_data = edu_data 1 0 Clear fixative plane. dnc_dwu_data =(edu_data) & ˜(FixativeMask1 | FixativeMask2) 0 1 Attempt to addfixative. if (FixativeMask1 & DnMask)!= 0 dnc_dwu_data = (edu_data) |(FixativeMask2 & ˜DnMask) else dnc_dwu_data = (edu_data) |(FixativeMask1) 0 0 Output dot as is. dnc_dwu_data = edu_data

[3915] When attempting to add fixative the DNC first tries to add itinto the plane defined by FixativeMask1. However, if this plane is deadthen it tries to add fixative by placing it into the plane defined byFixativeMask2. Note that if both FixativeMask1 and FixativeMask2 areboth all 0s then the dot data will not be changed.

[3916] 30 Dotline Writer Unit (DWU)

[3917] 30.1 Overview

[3918] The Dotline Writer Unit (DWU) receives 1 dot (6 bits) of colorinformation per cycle from the DNC. Dot data received is bundled into256-bit words and transferred to the DRAM. The DWU (in conjunction withthe LLU) implements a dot line FIFO mechanism to compensate for thephysical placement of nozzles in a printhead, and provides data ratesmoothing to allow for local complexities in the dot data generatepipeline.

[3919] 30.2 Physical Requirement Imposed by the Printhead

[3920] The physical placement of nozzles in the printhead means that inone firing sequence of all nozzles, dots will be produced over severalprint lines. The printhead consists of 12 rows of nozzles, one for eachcolor of odd and even dots. Odd and even nozzles are separated by D₂print lines and nozzles of different colors are separated by D₁ printlines. See FIG. 254 for reference. The first color to be printed is thefirst row of nozzles encountered by the incoming paper. In the examplethis is color 0 odd, although is dependent on the printhead type (see[10] for other printhead arrangments). Paper passes under printheadmoving downwards.

[3921] For example if the physical separation of each half row is 80 μmequating to D₁=D₂=5 print lines at 1600 dpi. This means that in onefiring sequence, color 0 odd nozzles will fire on dotline L, color 0even nozzles will fire on dotline L-D₁, color 1 odd nozzles will fire ondotline L-D₁-D₂ and so on over 6 color planes odd and even nozzles. Thetotal number of lines fired over is given as 0+5+5 . . . +5=0+11×5=55.See FIG. 255 for example diagram.

[3922] It is expected that the physical spacing of the printhead nozzleswill be 80 μm (or 5 dot lines), although there is no dependency onnozzle spacing. The DWU is configurable to allow other line nozzlespacings. TABLE 204 Relationship between Nozzle color/sense and linefiring Even line Odd line encountered first encountered first ColorSense line sense line Color 0 Even L even L-5 Odd L-5 odd L Color 1 EvenL-10 even L-15 Odd L-15 odd L-10 Color 2 Even L-20 even L-25 Odd L-25odd L-20 Color 3 Even L-30 even L-35 Odd L-35 odd L-30 Color 4 Even L-40even L-45 Odd L-45 odd L-40 Color 5 Even L-50 even L-55 Odd L-55 oddL-50

[3923] 30.3 Line Rate De-Coupling

[3924] The DWU block is required to compensate for the physical spacingbetween lines of nozzles. It does this by storing dot lines in a FIFO(in DRAM) until such time as they are required by the LLU for dot datatransfer to the printhead interface. Colors are stored separatelybecause they are needed at different times by the LLU. The dot linestore must store enough lines to compensate for the physical lineseparation of the printhead but can optionally store more lines to allowsystem level data rate variation between the read (printhead feed) andwrite sides (dot data generation pipeline) of the FIFOs.

[3925] A logical representation of the FIFOs is shown in FIG. 256, whereN is defined as the optional number of extra half lines in the dot linestore for data rate de-coupling.

[3926] 30.4 Dot Line Store Storage Requirements

[3927] For an arbitrary page width of d dots (where d is even), thenumber of dots per half line is d/2.

[3928] For interline spacing of D₂ and inter-color spacing of D₁, with Ccolors of odd and even half lines, the number of half line storage is(C−1)(D₂+D₁)+D1.

[3929] For N extra half line stores for each color odd and even, thestorage is given by (N*C*2).

[3930] The total storage requirement is ((C−1)(D₂+D₁)+D1+(N*C*2))*d/2 inbits.

[3931] Note that when determining the storage requirements for the dotline store, the number of dots per line is the page width and notnecessarily the printhead width. The page width is often the dot marginnumber of dots less than the printhead width. They can be the same sizefor full bleed printing.

[3932] For example in an A4 page a line consists of 13824 dots at 1600dpi, or 6912 dots per half dot line. To store just enough dot lines toaccount for an inter-line nozzle spacing of 5 dot lines it would take 55half dot lines for color 5 odd, 50 dot lines for color 5 even and so on,giving 55+50+45 . . . 10+5+0=330 half dot lines in total. If it isassumed that N=4 then the storage required to store 4 extra half linesper color is 4×12=48, in total giving 330+48=378 half dot lines. Eachhalf dot line is 6912 dots, at 1 bit per dot give a total storagerequirement of 6912 dots×378 half dot lines/8 bits=Approx 319 Kbytes.Similarly for an A3 size page with 19488 dots per line, 9744 dots perhalf line×378 half dot lines/8=Approx 899 Kbytes. TABLE 205 Storagerequirement for dot line store Lines Storage Lines Storage Page Nozzlerequired (N = 0) required (N = 4) size Spacing (N = 0) Kbytes (N = 4)Kbytes A4 4 264 223 312 263 5 330 278 378 319 A3 4 264 628 312 742 5 330785 378 899

[3933] The potential size of the dot line store makes it unfeasible tobe implemented in on-chip SRAM, requiring the dot line store to beimplemented in embedded DRAM. This allows a configurable dotline storewhere unused storage can be redistributed for use by other parts of thesystem.

[3934] 30.5 Nozzle Row Skew

[3935] Due to construction limitations of the bi-lithic printhead it ispossible that nozzle rows may be misaligned relative to each other. Oddand even rows, and adjacent color rows may be horizontally misaligned byup to 2 dot positions. Vertical misalignment can also occur but iscompensated for in the LLU and not considered here. The DWU is requiredto compensate for the horizontal misalignment.

[3936] Dot data from the HCU (through the DNC) produces a dot of 6colors all destined for the same physical location on paper. If thenozzle rows in the printhead are aligned as shown in FIG. 254 then noadjustment of the dot data is needed.

[3937] A conceptual misaligned printhead is shown in FIG. 257. The exactshape of the row alignment is arbitrary, although is most likely to besloping (if sloping, it could be sloping in either direction). The DWUis required to adjust the shape of the dot streams to take account ofthe join between printhead ICs. The introduction of the join shapebefore the data is written to the DRAM means that the PHI sees a singlecrossover point in the data since all lines are the same length and thecrossover point (since all rows are of equal length) is a verticalline—i.e. the crossover is at the same time for all even rows, and atthe same time for all odd rows as shown in FIG. 258.

[3938] To insert the shape of the join into the dot stream, for eachline we must first insert the dots for non-printable area 1, then theprintable area data (from the DNC), and then finally the dots fornon-printable area 2. This can also be considered as: first produce thedots for non-printable area 1 for line n, and then a repetition of:

[3939] produce the dots for the printable area for line n (from the DNC)

[3940] produce the dots for the non-printable area 2 (for line n)followed by the dots of non-printable area 1 (for line n+1)

[3941] The reason for considering the problem this way is thatregardless of the shape of the join, the shape of non-printable area 2merged with the shape of non-printable area 1 will always be a rectanglesince the widths of non-printable areas 1 and 2 are identical and thelengths of each row are identical. Hence step 2 can be accomplished bysimply inserting a constant number (MaxNozzleSkew) of 0 dots into thestream.

[3942] For example, if the color n even row non-printable area 1 is oflength X, then the length of color n even row non-printable area 2 willbe of length MaxNozzleSkew−X. The split between non-printable areas 1and 2 is defined by the NozzleSkew registers.

[3943] Data from the DNC is destined for the printable area only, theDWU must generate the data destined for the non-printable areas, andinsert DNC dot data correctly into the dot data stream before writingdot data to the fifos. The DWU inserts the shape of the misalignmentinto the dot stream by delaying dot data destined to different nozzlerows by the relative misalignment skew amount.

[3944] 30.6 Local Buffering

[3945] An embedded DRAM is expected to be of the order of 256 bits wide,which results in 27 words per half line of an A4 page, and 54 words perhalf line of A3. This requires 27 words×12 half colors (6 colors odd andeven)=324×256-bit DRAM accesses over a dotline print time, equating to 6bits per cycle (equal to DNC generate rate of 6 bits per cycle). Eachhalf color is required to be double buffered, while filling one bufferthe other buffer is being written to DRAM. This results in 256 bits×2buffers×12 half colors i.e. 6144 bits in total.

[3946] The buffer requirement can be reduced, by using 1.5 buffering,where the DWU is filling 128 bits while the remaining 256 bits are beingwritten to DRAM. While this reduces the required buffering locally itincreases the peak bandwidth requirement to the DRAM. With 2× bufferingthe average and peak DRAM bandwidth requirement is the same and is 6bits per cycle, alternatively with 1.5× buffering the average DRAMbandwidth requirement is 6 bits per cycle but the peak bandwidthrequirement is 12 bits per cycle. The amount of buffering used willdepend on the DRAM bandwidth available to the DWU unit.

[3947] Should the DWU fail to get the required DRAM access within thespecified time, the DWU will stall the DNC data generation. The DWU willissue the stall in sufficient time for the DNC to respond and still notcause a FIFO overrun. Should the stall persist for a sufficiently longtime, the PHI will be starved of data and be unable to deliver data tothe printhead in time. The sizing of the dotline store FIFO and internalFIFOs should be chosen so as to prevent such a stall happening.

[3948] 30.7 Dotline Data in Memory

[3949] The dot data shift register order in the printhead is shown inFIG. 254 (the transmit order is the opposite of the shift registerorder). In the example the type 0 printhead IC transmit order isincreasing even color data followed by decreasing odd color data. Thetype 1 printhead IC transmit order is decreasing odd color data followedby increasing even color data. For both printhead ICs the even data isalways increasing order and odd data is always decreasing. The PHIcontrols which printhead IC data gets shifted to.

[3950] From this it is beneficial to store even data in increasing orderin DRAM and odd data in decreasing order. While this order suits theexample printhead, other printheads exist where it would be beneficialto store even data in decreasing order, and odd data in increasingorder, hence the order is configurable. The order that data is stored inmemory is controlled by setting the ColorLineSense register.

[3951] The dot order in DRAM for increasing and decreasing sense isshown in FIG. 260 and FIG. 261 respectively. For each line in the dotstore the order is the same (although for odd lines the numbering willbe different the order will remain the same). Dot data from the DNC isalways received in increasing dot number order. For increasing sense dotdata is bundled into 256-bit words and written in increasing order inDRAM, word 0 first, then word 1, and so on to word N, where N is thenumber of words in a line.

[3952] For decreasing sense dot data is also bundled into 256-bit words,but is written to DRAM in decreasing order, i.e. word N is written firstthen word N−1 and so on to word 0. For both increasing and decreasingsense the data is aligned to bit 0 of a word, i.e. increasing sensealways starts at bit 0, decreasing sense always finishes at bit 0.

[3953] Each half color is configured independently of any other color.The ColorBaseAdr register specifies the position where data for aparticular dotline FIFO will begin writing to. Note that for increasingsense colors the ColorBaseAdr register specifies the address of thefirst word of first line of the fifo, whereas for decreasing sensecolors the ColorBaseAdr register specifies the address of last word ofthe first line of the FIFO.

[3954] Dot data received from the DNC is bundled in 256-bit words andtransferred to the DRAM. Each line of data is stored consecutively inDRAM, with each line separated by ColorLineInc number of words.

[3955] For each line stored in DRAM the DWU increments the line countand calculates the DRAM address for the next line to store.

[3956] This process continues until ColorFifoSize number of lines arestored, after which the DRAM address will wrap back to the ColorBaseAdraddress.

[3957] As each line is written to the FIFO, the DWU increments theFifoFillLevel register, and as the LLU reads a line from the FIFO theFifoFillLevel register is decremented. The LLU indicates that it hascompleted reading a line by a high pulse on the llu_dwu_line_rd line.

[3958] When the number of lines stored in the FIFO is equal to theMaxWriteAhead value the DWU will indicate to the DNC that it is nolonger able to receive data (i.e. a stall) by deasserting thedwu_dnc_ready signal.

[3959] The ColorEnable register determines which color planes should beprocessed, if a plane is turned off, data is ignored for that plane andno DRAM accesses for that plane are generated.

[3960] 30.8 Specifying Dot FIFOs

[3961] The dot line FIFOs when accessed by the LLU are specifieddifferently than when accessed by the DWU. The DWU uses a start addressand number of lines value to specify a dot FIFO, the LLU uses a startand end address for each dot FIFO. The mechanisms differ to allow moreefficient implementations in each block.

[3962] As a result of limitations in the LLU the dot FIFOs must bespecified contiguously and increasing in DRAM. See section 31.6 on page504 for further information.

[3963] 30.9 Implementation

[3964] 30.9.1 Definitions of I/O TABLE 206 DWU I/O Definition Port namePins I/O Description Clocks and Resets Pclk 1 In System Clock prst_n 1In System reset, synchronous active low DNC Interface dwu_dnc_ready 1Out Indicates that DWU is ready to accept data from the DNC.dnc_dwu_avail 1 In Indicates valid data present on dnc_dwu_data.dnc_dwu_data[5:0] 6 In Input bi-level dot data in 6 ink planes. LLUInterface dwu_llu_line_wr 1 Out DWU line write. Indicates that the DWUhas completed a full line write. Active high llfu_dwu_line_rd 1 In LLUline read. Indicates that the LLU has completed a line read. Activehigh. PCU Interface pcu_dwu_sel 1 In Block select from the PCU. Whenpcu_dwu_sel is high both pcu_adr and pcu_dataout are valid. pcu_rwn 1 InCommon read/not-write signal from the PCU. pcu_adr[7:2] 5 In PCU addressbus. Only 6 bits are required to decode the address space for thisblock. pcu_dataout[31:0] 32 In Shared write data bus from the PCU.dwu_pcu_rdy 1 Out Ready signal to the PCU. When dwu_pcu_rdy is high itindicates the last cycle of the access. For a write cycle this meanspcu_dataout has been registered by the block and for a read cycle thismeans the data on dwu_pcu_datain is valid. dwu_pcu_datain[31:0] 32 OutRead data bus to the PCU. DIU Interface dwu_diu_wreq 1 Out DWU requestsDRAM write. A write request must be accompanied by a valid write addresstogether with valid write data and a write valid. dwu_diu_wadr[21:5] 17Out Write address to DIU 17 bits wide (256-bit aligned word)diu_dwu_wack 1 In Acknowledge from DIU that write request has beenaccepted and new write address can be placed on dwu_diu_wadrdwu_diu_data[63:0] 64 Out Data from DWU to DIU. 256-bit word transferover 4 cycles First 64-bits is bits 63:0 of 256 bit word Second 64-bitsis bits 127:64 of 256 bit word Third 64-bits is bits 191:128 of 256 bitword Fourth 64-bits is bits 255:192 of 256 bit word dwu_diu_wvalid 1 OutSignal from DWU indicating that data on dwu_diu_data is valid.

[3965] 30.9.2 DWU Partition

[3966] 30.9.3 Configuration Registers

[3967] The configuration registers in the DWU are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for a description of theprotocol and timing diagrams for reading and writing registers in theDWU. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for theDWU. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of dwu_pcu_data. Table 207 liststhe configuration registers in the DWU. TABLE 207 DWU registersdescription Address DWU_base+ Register #bits Reset Description ControlRegisters 0x00 Reset 1 0x1 Active low synchronous reset, self de-activating. A write to this register will cause a DWU block reset. 0x04Go 1 0x0 Active high bit indicating the DWU is programmed and ready touse. A low to high transition will cause DWU block internal states toreset (configuration registers are not reset). Dot Line StoreConfiguration 0x08-0x34 ColorBaseA 12 × 17 0x000 00 Specifies the baseaddress (in words) in dr[11:0][21:5] memory where data from a particularhalf color (N) will be placed. For increasing sense colors theColorBase- Adr register specifies the address of the first word of firstline of the fifo, whereas for decreasing sense colors the ColorBaseAdrregister specifies the address of last word of the first line of thefifo. 0x38-0x64 ColorFifoSize[11:0] 12 × 8 0x00 Indicates the number oflines in the FIFO before the line increment will wrap around in memory.Bus 0,1 - Even, Odd line color 0 Bus 2,3 - Even, Odd line color 1 Bus4,5 - Even, Odd line color 2 Bus 6,7 - Even, Odd line color 3 Bus 8,9 -Even, Odd line color 4 Bus 10,11 - Even, Odd line color 5 0x68ColorLineSense 2 0x2 Specifies whether data written to DRAM for thishalf color is increasing or decreasing sense 0 - Decreasing sense 1 -Increasing sense Bit 0 Defines even color sense, Bit 1 Defines odd colorsense. 0x6C ColorEnable 6 0x3F Indicates whether a particular color isactive or not. When inactive no data is written to DRAM for that color.0 - Color off 1 - Color on One bit per color, bit 0 is Color 0 and soon. 0x70 MaxWriteAhead 8 0x00 Specifies the maximum number of lines thatthe DWU can be ahead of the LLU 0x74 LineSize 16 0x000 0 Indicates thenumber of dots per line produced by the DWU. 0x78 MaxNozzleSkew 4 0x0Specifies the number of dot-pairs the DWU needs to generate to flush thedata skew buffers. Corresponds to the non-printable area of theprinthead. 0x7C-0xA8 NozzleSkew 12 × 4 0x0 Specifies the relative skewof dot data nozzle rows in the printhead. Valid range is 0 (no skew)through to 12. Units represent dot-pairs, a skew of 1 or a rowrepresents two dots on the page. Bus 0,1 - Even, Odd line color 0 Bus2,3 - Even, Odd line color 1 Bus 4,5 - Even, Odd line color 2 Bus 6,7 -Even, Odd line color 3 Bus 8,9 - Even, Odd line color 4 Bus 10,11 -Even, Odd line color 5 0xAC ColorLineInc 8 0x00 Specifies the number ofwords (256-bit words) per dot line - 1. Working Registers 0xB0LineDotCnt 16 0x000 0 Indicates the number of remaining dots in thecurrent line. (Read Only) 0xB4 FifoFillLevel 8 0x00 Number of lines inthe FIFO, written to but not read. (Read Only)

[3968] A low to high transition of the Go register causes the internalstates of the DWU to be reset. All configuration registers will remainthe same. The block indicates the transition to other blocks via thedwu_go_pulse signal.

[3969] 30.9.4 Data Skew

[3970] The data skew block inserts the shape of the printhead join intothe dot data stream by delaying dot data by the relative nozzle skewamount (given by nozzle_skew). It generates zero fill data introducedintroduced into the dot data stream to achieve the relative skew (andalso to flush dot data from the delay registers).

[3971] The data skew block consists of 12 12-bit shift registers, oneper color odd and even. The shift registers are in groups of 6, onegroup for even colors, and one for odd colors. Each time a valid dataword is received from the DNC the dot data is shifted into either theodd or even group of shift registers. The odd_even_sel registerdetermines which group of shift registers are valid for that cycle andalternates for each new valid data word. When a valid word is receivedfor a group of shift registers, the shift register is shifted by onelocation with the new data word shifted into the registers (the top wordin the register will be discarded).

[3972] When the dot counter determines that the data skew block shouldzero fill (zero_fill), the data skew block will shift zero dot data intothe shift registers until the line has completed. During this time theDNC will be stalled by the de-assertion of the dwu_dnc_ready signal.

[3973] The data skew block selects dot data from the shift registers andis passed to the buffer address generator block. The data bits selectedis determined by the configured index values in the NozzleSkewregisters. // determine when data is valid data_valid = (((dnc_dwu_avail= = 1)OR(zero_fill = = 1)) AND (dwu_ready = =1)) // implement the zerofill mux if (zero_fill = = 1) then dot_data_in = 0 else dot_data_in =dnc_dwu_data // the data delay buffers if (dwu_go_pulse = =1) thendata_delay[1:0] [11:0] [5:0] = 0 // reset all delay buffer odd=1,even=0odd_even_sel = 0 elsif (data_valid = = 1) then { odd_even_sel =˜odd_even_sel // update the odd/even buffers, with shiftdata_delay[odd_even_sel] [11:1] [5:0] = data_delay[odd_even_sel] [10:0][5:0] // shift data data delay[odd even sel] [0] [5:0] = dot data in[5:0] // shift in new data // select the correct output data for(i=0;i<6; i++) { // skew selector skew = nozzle skew[ {i,odd even sel} ]// temporary variable // data select array, include data delay and inputdot data data_select[12:0] = {data_delay[odd_even_sel] [11:0],dot_data_in} // mux output the data word to next block (13 to 1 mux)dot_data[i] = data_select [skew] [i] } }

[3974] 30.9.5 Fifo Fill Level

[3975] The DWU keeps a running total of the number of lines in the dotstore FIFO. Each time the DWU writes a line to DRAM (determined by theDIU interface subblock and signalled via line_wr) it increments thefilllevel and signals the line increment to the LLU (pulse ondwu_llu_line_wr). Conversely if it receives an active llu_dwu_line_rdpulse from the LLU, the filllevel is decremented. If the filllevelincreases to the programmed max level (max_write_ahead) then the DWUstalls and indicates back to the DNC by de-asserting the dwu_dnc_readysignal.

[3976] If one or more of the DIU buffers fill, the DIU interface signalsthe fill level logic via the buf_full signal which in turn causes theDWU to de-assert the dwu_dnc_ready signal to stall the DNC. The buf_fullsignals will remain active until the DIU services a pending request fromthe full buffer, reducing the buffer level.

[3977] When the dot counter block detects that it needs to insert zerofill dots (zero_fill equals 1) the DWU will stall the DNC while the zerodots are being generated (by de-asserting dwu_dnc_ready), but will allowthe data skew block to generate zero fill data (the dwu_ready signal).dwu_dnc_ready = ˜((buf_full= = 1) OR (filllevel = = max_write_ahead ) OR(zero_fill = = 1)) dwu_ready = ˜((buf_full= = 1) OR (filllevel = =max_write_ahead ))

[3978] The DWU does not increment the fill level until a complete lineof dot data is in DRAM not just a complete line received from the DNC.This ensures that the LLU cannot start reading a partial line from DRAMbefore the DWU has finished writing the line.

[3979] The fill level is reset to zero each time a new page is started,on receiving a pulse via the dwu_go_pulse signal.

[3980] The line fifo fill level can be read by the CPU via the PCU atany time by accessing the FifoFillLevel register.

[3981] 30.9.6 Buffer Address Generator

[3982] 30.9.6.1 Buffer Address Generator Description

[3983] The buffer address generator subblock is responsible foraccepting data from the data skew block and writing it to the DIUbuffers in the correct order.

[3984] The buffer address and active bit-write for a particular dot datawrite is calculated by the buffer address generator based on the dotcount of the current line, programmed sense of the color and the linesize.

[3985] All configuration registers should be programmed while the Go bitis set to zero, once complete the block can be enabled by setting the Gobit to one. The transition from zero to one will cause the internalstates to reset.

[3986] If the color_line_sense signal for a color is one (i.e.increasing) then the bit-write generation is straight forward as dotdata is aligned with a 256-bit boundary. So for the first dot in thatcolor, the bit 0 of the wr_bit bus will be active (in buffer word 0),for the second dot bit 1 is active and so on to the 255^(th) dot wherebit 63 is active (in buffer word 3). This is repeated for all 256-bitwords until the final word where only a partial number of bits arewritten before the word is transferred to DRAM.

[3987] If color_line_sense signal for a color is zero (i.e. decreasing)the bit-write generation for that color is adjusted by an offsetcalculated from the pre-programmed line length (line_size). The offsetadjusts the bit write to allow the line to finish on a 256-bit boundary.For example if the line length was 400, for the first dot received bit 7(line length is halved because of odd/even lines of color) of the wr_bitis active (buffer word 3), the second bit 6 (buffer word 3), to the200^(th) dot of data with bit 0 of wr_bit active (buffer word 0).

[3988] 30.9.6.2 Bit-Write Decode

[3989] The buffer address generator contains 2 instances of thebit-write decode, one configured for odd dot data the other for even.The counter (either up or down counter) used to generate the addressesis selected by the color_line_sense signal. Each block determines if itis active on this cycle by comparing its configured type with thecurrent dot count address and the data_active signal.

[3990] The wr_bit bus is a direct decoding of the lower 6 count bits(count[6:1]), and the DIU buffer address is the remaining higher bits ofthe counter (count[10:7]).

[3991] The signal generation is given as follows: // determine thecounter to use if (color_line_sense = = 1 ) count = up_cnt[10:0] elsecount = dn_cnt[10:0] // determine if active, based on instance typewr_en = data_active & (count[0] {circumflex over ( )} odd_even_type) //odd =1, even =0 // determine the bit write value wr_bit[63:0] =decode(count[6:1]) // determine the buffer 64-bit address wr_adr[3:0] =count[10:7]

[3992] 30.9.6.3 Up Counter Generator

[3993] The up counter increments for each new dot and is used todetermine the write position of the dot in the DIU buffers forincreasing sense data. At the end of each line of dot data (as indicatedby line_fin), the counter is rounded up to the nearest 256-bit wordboundary. This causes the DIU buffers to be flushed to DRAM includingany partially filled 256-bit words. The counter is reset to zero if thedwu_go_pulse is one. // Up-Counter Logic if (dwu_go_pulse = = 1) then {up_cnt[10:0] = 0 elsif (line_fin = = 1 ) then // round up if(up_cnt[8:1] != 0) up_cnt[10:9]++ else up_cnt[10:9] // bit-selectorup_cnt[7:0]=0 elsif (data_valid = = 1) then up_cnt[7:0]++

[3994] 30.9.6.4 Down Counter Generator

[3995] The down counter logic decrements for each new dot and is used todetermine the write position of the dot in the DUI buffers fordecreasing sense data. When the dwu_go_pulse bit is one the lower bits(i.e. 8 to 0) of the counter are reset to line size value (line_size),and the higher bits to zero. The bits used to determine the bit-writevalues and 64-bit word addresses in the DIU buffers begin at line sizeand count down to zero. The remaining higher bits are used to determinethe DIU buffer 256-bit address and buffer fill level, begin at zero andcount up. The counter is active when valid dot data is present, i.e.data_valid equals 1.

[3996] When the end of line is detected (line_fin equals 1) the counteris rounded to the next 256-bit word, and the lower bits are reset to theline size value. //Down-Counter Logic if (dwu_go_pulse = = 1) thendn_cnt[8:0] = line_size[8:0] dn_cnt[10:9] = 0 elsif (line_fin = = 1 )then // perform rounding up if (dn_cnt[8:1] != 0) dn_cnt[10:9]++ elsedn_cnt[10:9] // bit-select is reset dn_cnt[8:0]=line_size[8:0] // bitselect bits elsif (data_valid = = 1) then dn_cnt[8:0] − − dn_cnt[10:9]++

[3997] 30.9.6.5 Dot Counter

[3998] The dot counter simply counts each active dot received from thedata skew block. It sets the counter to line_size and decrements eachtime a valid dot is received. When the count equals zero the line_finsignal is pulsed and the counter is reset to line_size.

[3999] When the count is less than the max_nozzle_skew* 2 value the dotcounter indicates to the data skew block to zero fill the remainder ofthe line (via the zero_fill signal). Note that the max_nozzle_skew unitsare dot-pairs as opposed to dots, hence the by 2 multiplication forcomparison with the dot counter.

[4000] The counter is reset to line_size when dwu_go_pulse is 1.

[4001] 30.9.7 DIU Buffer

[4002] The DIU buffer is a 64 bit×8 word dual port register array withbit write capability. The buffer could be implemented with flip-flopsshould it prove more efficient.

[4003] 30.9.8 DIU Interface

[4004] 30.9.8.1 DIU Interface General Description

[4005] The DIU interface determines when a buffer needs a data word tobe transferred to DRAM. It generates the DRAM address based on the dotline position, the color base address and the other programmedparameters. A write request is made to DRAM and when acknowledged a256-bit data word is transferred. The interface determines if furtherwords need to be transferred and repeats the transfer process.

[4006] If the FIFO in DRAM has reached its maximum level, or one of thebuffers has temporarily filled, the DWU will stall data generation fromthe DNC.

[4007] A similar process is repeated for each line until the end of pageis reached. At the end of a page the CPU is required to reset theinternal state of the block before the next page can be printed. A lowto high transition of the Go register will cause the internal blockreset, which causes all registers in the block to reset with theexception of the configuration registers. The transition is indicated tosubblocks by a pulse on dwu_go_pulse signal.

[4008] 30.9.8.2 Interface Controller

[4009] The interface controller state machine waits in Idle state untilan active request is indicated by the read pointer (via the req_activesignal). When an active request is received the machine proceeds to theColorSelect state to determine which buffers need a data transfer. Inthe ColorSelect state it cycles through each color and determines if thecolor is enabled (and consequently the buffer needs servicing), ifenabled it jumps to the Request state, otherwise the color_cnt isincremented and the next color is checked.

[4010] In the Request state the machine issues a write request to theDIU and waits in the Request state until the write request isacknowledged by the DIU (diu_dwu_wack). Once an acknowledge is receivedthe state machine clocks through 4 cycles transferring 64-bit data wordseach cycle and incrementing the corresponding buffer read address. Aftertransferring the data to the DIU the machine returns to the ColorSelectstate to determine if further buffers need servicing. On the transitionthe controller indicates to the address generator (adr_update) to updatethe address for that selected color.

[4011] If all colors are transferred (color_cnt equal to 6) the statemachine returns to Idle, updating the last word flags (group_fin) andrequest logic (req_update).

[4012] The dwu_diu_wvalid signal is a delayed version of the buf_rd_ensignal to allow for pipeline delays between data leaving the buffer andbeing clocked through to the DIU block.

[4013] The state machine will return from any state to Idle if the resetor the dwu_go_pulse is 1.

[4014] 30.9.8.3 Address Generator

[4015] The address generator block maintains 12 pointers (coloradr[11:0]) to DRAM corresponding to current write address in the dotline store for each half color. When a DRAM transfer occurs the addresspointer is used first and then updated for the next transfer for thatcolor. The pointer used is selected by the req_sel bus, and the pointerupdate is initiated by the adr_update signal from the interfacecontroller.

[4016] The pointer update is dependent on the sense of the color of thatpointer, the pointer position in a line and the line position in theFIFO. The programming of the color_base_adr needs to be adjusteddepending of the sense of the colors. For increasing sense colors thecolor_base_adr specifies the address of the first word of first line ofthe fifo, whereas for decreasing sense colors the color_base_adrspecifies the address of last word of the first line of the FIFO.

[4017] For increasing colors, the initialization value (i.e. whendwu_go_pulse is 1) is the color_base_adr.

[4018] For each word that is written to DRAM the pointer is incremented.If the word is the last word in a line (as indicated by last_wd fromthat read pointers) the pointer is also incremented. If the word is thelast word in a line, and the line is the last line in the FIFO(indicated by fifo_end from the line counter) the pointer is reset tocolor_base_adr.

[4019] In the case of decreasing sense colors, the initialization value(i.e. when dwu_go_pulse is 1) is the color_base_adr. For each line ofdecreasing sense color data the pointer starts at the line end anddecrements to the line start. For each word that is written to DRAM thepointer is decremented. If the word is the last word in a line thepointer is incremented by color_line_inc*2+1. One line length to accountfor the line of data just written, and another line length for the nextline to be written. If the word is the last word in a line, and the lineis the last line in the FIFO the pointer is reset to the initializationvalue (i.e. color_base_adr).

[4020] The address is calculated as follows: if (dwu_go_pulse = = 1)then color_adr[11:0] = color_base_adr[11:0] [21:5] elsif (adr_update= = 1) then { // determine the color color = req_sel[3:0] // line endand fifo wrap if ((fifo_end[color] = = 1) AND (last_wd = = 1)) then { //line end and fifo wrap color_adr[color] = color_base_adr[color] [21:5] }elsif ( last_wd = = 1) then { // just a line end no fifo wrapif (color_line_sense[color % 2] = = 1) then // increasing sensecolor_adr[color] ++ else // decreasing sensecolor_adr[color] = color_adr[color] + ( color_line_inc * 2) + 1 } else {// regular word write if (color_line_sense[color % 2] = = 1) then //increasing sense color_adr[color]++ else // decreasing sensecolor_adr[color]− − } } // select the correct address, for this transferdwu_diu_wadr = color_adr[req_sel]

[4021] 30.9.8.4 Line Count

[4022] The line counter logic counts the number of dot data lines storedin DRAM for each color. A separate pointer is maintained for each color.A line pointer is updated each time the final word of a line istransferred to DRAM. This is determined by a combination of adr_updateand last_wd signals. The pointer to update is indicated by the req_selbus.

[4023] When an update occurs to a pointer it is compared to zero, if itis non-zero the count is decremented, otherwise the counter is reset tocolor_fifo_size. If a counter is zero the fifo_end signals is set highto indicates to the address generator block that the line is the lastline of this colors fifo.

[4024] If the dwu_go_pulse signal is one the counters are reset tocolor_fifo_size. if (dwu_go_pulse = = 1) then line_cnt[11:0] =color_fifo_size[11:0] elsif ((adr_update = = 1) AND (last_wd = = 1))then { // determine the pointer to operate on color = req_sel[3:0] //update the pointer if (line_cnt[color] = = 0) then line_cnt[color] =color_fifo_size[color] else line_cnt[i] − − } // count is zero its thelast line of fifo for(i=0 ;i <12;i++){ fifo_end[i] = (line_cnt[i] = = 0)}

[4025] 30.9.8.5 Read Pointer

[4026] The read pointer logic maintains the buffer read addresspointers. The read pointer is used to determine which 64-bit words toread from the buffer for transfer to DRAM.

[4027] The read pointer logic compares the read and write pointers ofeach DIU buffer to determine which buffers require data to betransferred to DRAM, and which buffers are full (the buf_full signal).

[4028] Buffers are grouped into odd and even buffers groups. If an oddbuffer requires DRAM access the odd_pend signals will be active, if aneven buffer requires DRAM access the even_pend signals will be active.If both odd and even buffers require DRAM access at exactly the sametime, the even buffers will get serviced first. If a group of oddbuffers are being serviced and an even buffer becomes pending, the oddgroup of buffers will be completed before the starting the even group,and vice versa.

[4029] If any buffer requires a DRAM transfer, the logic will indicateto the interface controller via the req_active signal, with theodd_even_sel signal determining which group of buffers get serviced. Theinterface controller will check the color_enable signal and issue DRAMtransfers for all enabled colors in a group. When the transfers arecomplete it tells the read pointer logic to update the requests pendingvia req update signal.

[4030] The req_sel[3:0] signal tells the address generator which bufferis being serviced, it is constructed from the odd_even_sel signal andthe color_cnt[2:0] bus from the interface controller. When data is beingtransferred to DRAM the word pointer and read pointer for thecorresponding buffer are updated. The req_sel determines which pointershould be incremented. // determine if request is active even if (wr_adr[0] [3:2] != rd_adr[0] [3:2] ) even_pend = 1 else even_pend = 0 //determine if request is active odd if ( wr_adr[1] [3:2] != rd_adr[1][3:2] ) even_pend = 1 else even_pend = 0 // determine if any buffer isfull if ((wr_adr[0] [3:0] − rd_adr[0] [3:0]) > 7)OR((wr_adr[1] [3:0] −rd_adr[1] [3:0])> 7)) then buf_full = 1 // fixed servicing order, onlyupdate when controller dictates so if (req_update = = 1) then { if(even_pend = = 1) then // even always first odd_even_sel = 0 req_active= 1 elsif (odd_pend = = 1 ) then // then check odd odd_even_sel = 0req_active = 1 else // nothing active odd_even_sel = 0 req_active = 0 }// selected requestor req_sel[3:0] = {color_cnt[2:0] , odd_even_sel} //concatentation

[4031] The read address pointer logic consists of 2 2-bit counters and aword select pointer. The pointers are reset when dwu_go_pulse is one.The word pointer (word_ptr) is common to all buffers and is used to readout the 64-bit words from the DIU buffer. It is incremented whenbuf_rd_en is active. When a group of buffers are updated the statemachine increments the read pointer (rd_ptr[odd_even_sel]) via thegroup_fin signal. A concatenation of the read pointer and the wordpointer are use to construct the buffer read address. The read pointersare not reset at the end of each line. // determine which pointer toupdate if (dwu_go_pulse = = 1) then rd_ptr[1:0] = 0 word_ptr = 0 elsif(buf_rd_en = = 1) then { word_ptr++ // word pointer update elsif(group_fin = = 1) then rd_ptr [odd_even_sel]++ // update the readpointer // create the address from the pointer, and word readerrd_adr[odd_even_sel] = {rd_ptr[odd_even_sel],word_ptr} // concatenation

[4032] The read pointer block determines if the word being read from theDIU buffers is the last word of a line. The buffer address generatorindicate the last dot is being written into the buffers via the line finsignal. When received the logic marks the 256-bit word in the buffers asthe last word. When the last word is read from the DIU buffer andtransferred to DRAM, the flag for that word is reflected to the addressgenerator. // line end set the flags if (dwu_go_pulse = = 1) thenlast_flag[1:0][1:0] = 0 elsif (line_fin = = 1) then // determines thecurrent 256-bit word even been written to last_flag[0][wr_adr[0][2]] = 1// even group flag // determines the current 256-bit word odd beenwritten to last_flag[1][wr_adr[1][2]] = 1 // odd group flag // last wordreflection to address generator last_wd = last_flag[odd_even_sel][rd_ptr[req_sel][0]] // clear the flag if (group_fin = = 1) then last_flag [odd_even_sel][rd_ptr[req_sel][0]] = 0

[4033] When a complete line has been written into the DIU buffers (buthas not yet been transferred to DRAM), the buffer address generatorblock will pulse the line_fin signal. The DWU must wait until allenabled buffers are transferred to DRAM before signaling the LLU that acomplete line is available in the dot line store (dwu_llu_line_wrsignal). When the line_fin is received all buffers will require transferto DRAM. Due to the arbitration, the even group will get serviced firstthen the odd. As a result the line finish pulse to the LLU is generatedfrom the last_flag of the odd group. // must be odd,odd group transfercomplete and the last word dwu_llu_line_wr = odd_even_sel AND group_finAND last_wd

[4034] 31 Line Loader Unit (LLU)

[4035] 31.1 Overview

[4036] The Line Loader Unit (LLU) reads dot data from the line buffersin DRAM and structures the data into even and odd dot channels destinedfor the same print time. The blocks of dot data are transferred to thePHI and then to the printhead. FIG. 267 shows a high level data flowdiagram of the LLU in context.

[4037] 31.2 Physical Requirement Imposed by the Printhead

[4038] The DWU re-orders dot data into 12 separate dot data line FIFOsin the DRAM. Each FIFO corresponds to 6 colors of odd and even data. TheLLU reads the dot data line FIFOs and sends the data to the printheadinterface. The LLU decides when data should be read from the dot dataline FIFOs to correspond with the time that the particular nozzle on theprinthead is passing the current line. The interaction of the DWU andLLU with the dot line FIFOs compensates for the physical spread ofnozzles firing over several lines at once. For further explanation seeSection 30 Dotline Writer Unit (DWU) and Section 32 Printhead Interface(PHI). FIG. 268 shows the physical relationship of nozzle rows and theline time the LLU starts reading from the dot line store.

[4039] Within each line of dot data the LLU is required to generate aneven and odd dot data stream to the PHI block. FIG. 269 shows the evenand dot streams as they would map to an example bi-lithic printhead. ThePHI block determines which stream should be directed to which printheadIC.

[4040] 31.3 Dot Generate and Transmit Order

[4041] The structure of the printhead ICs dictate the dot transmit orderto each printhead IC. The LLU reads data from the dot line FIFO,generates an even and odd dot stream which is then re-ordered (in thePHI) into the transmit order for transfer to the printhead.

[4042] The DWU separates dot data into even and odd half lines for eachcolor and stores them in DRAM. It can store odd or even dot data inincreasing or decreasing order in DRAM. The order is programmable butfor descriptive purposes assume even in increasing order and odd indecreasing order. The dot order structure in DRAM is shown in FIG. 261.

[4043] The LLU contains 2 dot generator units. Each dot generator readsdot data from DRAM and generates a stream of odd or even dots. The dotorder may be increasing or decreasing depending on how the DWU wasprogrammed to write data to DRAM. An example of the even and odd dotdata streams to DRAM is shown in FIG. 270. In the example the odd dotgenerator is configured to produce odd dot data in decreasing order andthe even dot generator produces dot data_in increasing order.

[4044] The PHI block accepts the even and odd dot data streams andreconstructs the streams into transmit order to the printhead.

[4045] The LLU line size refers to the page width in dots and notnecessarily the printhead width. The page width is often the dot marginnumber of dots less than the printhead width. They can be the same sizefor full bleed printing.

[4046] 31.4 LLU Start-Up

[4047] At the start of a page the LLU must wait for the dot line storein DRAM to fill to a configured level (given by FifoReadThreshold)before starting to read dot data. Once the LLU starts processing dotdata for a page it must continue until the end of a page, the DWU (andother PEP blocks in the pipeline) must ensure there is always data inthe dot line store for the LLU to read, otherwise the LLU will stall,causing the PHI to stall and potentially generate a print error. TheFifoReadThreshold should be chosen to allow for data rate mismatchesbetween the DWU write side and the LLU read side of the dot line FIFO.The LLU will not generate any dot data until FifoReadThreshold level inthe dot line FIFO is reached.

[4048] Once the FifoReadThreshold is reached the LLU begins pageprocessing, the FifoReadThreshold is ignored from then on.

[4049] When the LLU begins page processing it produces dot data for allcolors (although some dot data color may be null data). The LLU comparesthe line count of the current page, when the line count exceeds theColorRelLine configured value for a particular color the LLU will startreading from that colors FIFO in DRAM. For colors that have not exceededthe ColorRelLine value the LLU will generate null data (zero data) andnot read from DRAM for that color. ColorRelLine[N] specifies the numberof lines separating the N^(th) half color and the first half color toprint on that page. For the example printhead shown in FIG. 268, color 0odd will start at line 0, the remaining colors will all have null data.Color 0 odd will continue with real data until line 5, when color 0 oddand even will contain real data the remaining colors will contain nulldata. At line 10, color 0 odd and even and color 1 odd will contain realdata, with remaining colors containing null data. Every 5 lines a newhalf color will contain real data and the remaining half colors nulldata until line 55, when all colors will contain real data. In theexample ColorRelLine[O]=5, ColorRelLine[1]=0, ColorRelLine[2]=15,ColorRelLine[3]=10. etc.

[4050] It is possible to turn off any one of the color planes of data(via the ColorEnable register), in such cases the LLU will generatezeroed dot data information to the PHI as normal but will not read datafrom the DRAM.

[4051] 31.4.1 LLU Bandwidth Requirements

[4052] The LLU is required to generate data for feeding to the printheadinterface, the rate required is dependent on the printhead constructionand on the line rate configured. The maximum data rate the LLU canproduce is 12 bits of dot data per cycle, but the PHI consumes at 12bits every 2 pclk cycles out of 3, i.e. 8 bits per pclk cycle. Thereforethe DRAM bandwidth requirement for a double buffered LLU is 8 bits percycle on average. If 1.5 buffering is used then the peak bandwidthrequirement is doubled to 16 bits per cycle but the average remains at 8bits per cycle. Note that while the LLU and PHI could produce data atthe 8 bits per cycle rate, the DWU can only produce data at 6 bits percycle rate.

[4053] 31.5 Vertical Row Skew

[4054] Due to construction limitations of the bi-lithic printhead it ispossible that nozzle rows may be misaligned relative to each other. Oddand even rows, and adjacent color rows may be horizontally misaligned byup to 2 dot positions. Vertical misalignment can also occur between bothprinthead ICs used to construct the printhead. The DWU compensates forthe horizontal misalignment (see Section 30.5), and the LLU compensatesfor the vertical misalignment. For each color odd and even the LLUmaintains 2 pointers into DRAM, one for feeding printhead A(CurrentPtrA) and other for feeding printhead B (CurrentPtrB). Bothpointers are updated and incremented in exactly the same way, but differin their initial value programming. They differ by vertical skew numberof lines, but point to the same relative position within a line.

[4055] At the start of a line the LLU reads from the FIFO usingCurrentPtrA until the join point between the printhead ICs is reached(specified by JoinPoint), after which the LLU reads from DRAM usingCurrentPtrB. If the JoinPoint coincides with a 256-bit word boundary,the swap over from pointer A to pointer B is straightforward. If theJoinPoint is not on a 256-bit word boundary, the LLU must read the256-bit word of data from CurrentPtrA location, generate the dot data upto the join point and then read the 256-bit word of data fromCurrentPtrB location and generate dot data from the join point to theword end. This means that if the JoinPoint is not on a 256-bit boundarythen the LLU is required to perform an extra read from DRAM at the joinpoint and not increment the address pointers.

[4056] 31.5.1 Dot Line FIFO Initialization

[4057] For each dot line FIFO there are 2 pointers reading from it, eachskewed by a number of dot lines in relation to the other (the skewamount could be positive or negative). Determining the exact number ofvalid lines in the dot line store is complicated by two pointers readingfrom different positions in the FIFO. It is convenient to remove theproblem by pre-zeroing the dot line FIFOs effectively removing the needto determine exact data validity. The dot FIFOs can be initialized in anumber of ways, including

[4058] the CPU writing 0s,

[4059] the LBD/SFU writing a set of 0 lines (16 bits per cycle),

[4060] the HCU/DNC/DWU being programmed to produce 0 data

[4061] 31.6 Specifying Dot FIFOS

[4062] The dot line FIFOs when accessed by the LLU are specifieddifferently than when accessed by the DWU. The DWU uses a start addressand number of lines value to specify a dot FIFO, the LLU uses a startand end address for each dot FIFO. The mechanisms differ to allow moreefficient implementations in each block.

[4063] The start address for each half color N is specified by theColorBaseAdr[N] registers and the end address (actually the end addressplus 1) is specified by the ColorBaseAdr[N+1]. Note there are 12 colorsin total, 0 to 11, the ColorBaseAdr[12] register specifies the end ofthe color 11 dot FIFO and not the start of a new dot FIFO. As a resultthe dot FIFOs must be specified contiguously and increasing in DRAM.

[4064] 31.7 Implementation

[4065] 31.7.1 LLU Partition

[4066] 31.7.2 Definitions of I/O TABLE 208 LLU I/O definition Port namePins I/O Description Clocks and Resets Pclk 1 In System clock prst_n 1In System reset, synchronous active low PHI Interfacellu_phi_data[1:0][5:0] 2 × 6 Out Dot Data from LLU to the PHI, each bitis a color plane 5 downto 0. Bus 0 - Even dot data stream Bus 1 - Odddot data stream Data is active when corresponding bit is active inllu_phi_avail bus phi_llu_ready[1:0] 2 In Indicates that PHI is ready toaccept data from the LLU 0 - Even dot data stream 1 - Odd dot datastream llu_phi_avail[1:0] 2 Out Indicates valid data present oncorresponding llu_phi_data. 0 - Even dot data stream 1 - Odd dot datastream DIU Interface llu_diu_rreq 1 Out LLU requests DRAM read. A readrequest must be accompanied by a valid read address. llu_diu_radr[21:5]17 Out Read address to DIU 17 bits wide (256-bit aligned word).diu_llu_rack 1 In Acknowledge from DIU that read request has beenaccepted and new read address can be placed on llu_diu_radrdiu_data[63:0] 64 In Data from DIU to LLU. Each access is 256-bitsreceived over 4 clock cycles First 64-bits is bits 63:0 of 256 bit wordSecond 64-bits is bits 127:64 of 256 bit word Third 64-bits is bits191:128 of 256 bit word Fourth 64-bits is bits 255:192 of 256 bit worddiu_llu_rvalid 1 In Signal from DIU telling LLU that valid read data ison the diu_data bus DWU Interface dwu_llu_line_wr 1 In DWU line write.Indicates that the DWU has completed a full line write. Active highllu_dwu_line_rd 1 Out LLU line read. Indicates that the LLU hascompleted a line read. Active high. PCU Interface pcu_llu_sel 1 In Blockselect from the PCU. When pcu_llu_sel is high both pcu_adr andpcu_dataout are valid. pcu_rwn 1 In Common read/not-write signal fromthe PCU. pcu_adr[7:2] 6 In PCU address bus. Only 6 bits are required todecode the address space for this block. pcu_dataout[31:0] 32 In Sharedwrite data bus from the PCU. llu_pcu_rdy 1 Out Ready signal to the PCU.When llu_pcu_rdy is high it indicates the last cycle of the access. Fora write cycle this means pcu_dataout has been registered by the blockand for a read cycle this means the data on llu_pcu_datain is valid.llu_pcu_datain[31:0] 32 Out Read data bus to the PCU.

[4067] 31.7.3 Configuration Registers

[4068] The configuration registers in the LLU are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for a description of theprotocol and timing diagrams for reading and writing registers in theLLU. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for theLLU. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of llu_pcu_datain. Table 209lists the configuration registers in the LLU. TABLE 209 LLU registersdescription Address LLU_base+ Register #bits Reset Description ControlRegisters 0x00 Reset 1 0x1 Active low synchronous reset, self de-activating. A write to this register will cause a LLU block reset. 0x04Go 1 0x0 Active high bit indicating the LLU is programmed and ready touse. A low to high transition will cause LLU block internal states toreset. Configuration 0x08-0x38 ColorBaseAdr[12:0][21:5] 13 × 17 0x000 00Specifies the base address (in words) in memory where data from aparticular half color (N) will be placed. Also specifies the endaddress + 1 (256- bit words) in memory where fifo data for a particularhalf color ends. For color N the start address is ColorBaseAdr[N] andthe end address +1 is ColorBase- Adr[N+1] 0x3C ColorEnable 6 0x3FIndicates whether a particular color is active or not. When inactive nodata is written to DRAM for that color. 0 - Color off 1 - Color on Onebit per color, bit 0 is Color 0 and so on. 0x40 LineSize 16 0x000 0Indicates the number of dots per line. 0x44 FifoReadThreshold 8 0x00Specifies the number of lines that should be in the FIFO before the LLUstarts reading. 0x48-0x74 ColorRelLine[11:0] 12 × 8 0x00 Specifies therelative number of lines to wait from the first before starting to readdot data from the corresponding dot data FIFO Bus 0,1 - Even, Odd linecolor 0 Bus 2,3 - Even, Odd line color 1 Bus 4,5 - Even, Odd line color2 Bus 6,7 - Even, Odd line color 3 Bus 8,9 - Even, Odd line color 4 Bus10,11 - Even, Odd line color 5 0x78-0x7C JoinPoint 2 × 16 0x000 0Specifies the join point in dots between both printhead ICs. Bus 0 -Even dot generator join point Bus 1 - Odd dot generator join point0x80-0x84 JoinWord 2 × 8 0x00 Specifies the join point in words betweenboth printhead ICs. Bus 0 - Even dot generator join point Bus 1 - Odddot generator join point 0x90-0xBC CurrentAdrA[11:0][21:5] 12 × 17 0x0000 Current Address pointers associated with printhead A Bus 0,1 - Even,Odd line color 0 Bus 2,3 - Even, Odd line color 1 Bus 4,5 - Even, Oddline color 2 Bus 6,7 - Even, Odd line color 3 Bus 8,9 - Even, Odd linecolor 4 Bus 10,11 - Even, Odd line color 5 Working registers 0xC0CurrentAdrB[11:0][21:5] 12 × 17 0x000 0 Current Address pointersassociated with printhead B 0xEC Bus 0,1 - Even, Odd line color 0 Bus2,3 - Even, Odd line color 1 Bus 4,5 - Even, Odd line color 2 Bus 6,7 -Even, Odd line color 3 Bus 8,9 - Even, Odd line color 4 Bus 10,11 -Even, Odd line color 5 Working registers Working Registers 0xF0FifoFillLevel 8 0x00 Number of lines in the dot line FIFO, line writtenin but not read out. (Read Only)

[4069] A low to high transition of the Go register causes the internalstates of the LLU to be reset. All configuration registers will remainthe same. The block indicates the transition to other blocks via thellu_go_pulse signal.

[4070] 31.7.4 Dot Generator

[4071] The dot generator block is responsible for reading dot data fromthe DIU buffers and sending the dot data in the correct order to the PHIblock. The dot generator waits for llu_en signal from the fifo filllevel block, once active it starts reading data from the 6 DIU buffersand generating dot data for feeding to the PHI.

[4072] In the LLU there are two instances of the dot generator, onegenerating odd data and the other generating even data.

[4073] At any time the ready bit from the PHI could be de-asserted, ifthis happens the dot generator will stop generating data, and wait forthe ready bit to be re-asserted.

[4074] 31.7.4.1 Dot Count

[4075] In normal operation the dot counter will wait for the llu_en andthe ready to be active before starting to count. The dot count willproduce data as long as the phi_llu_ready is active. If thephi_llu_ready signal goes low the count will be stalled.

[4076] The dot counter increments for each dot that is processed perline. It is used to determine the line finish position, and the bitselect value for reading from the DIU buffers. The counter is resetafter each line is processed (line_fin signal). It determines when aline is finished by comparing the dot count with the configured linesize divided by 2 (note that odd numbers of dots will be rounded down).// define the line finish if (dot_cnt [14:0] = = line_size[15:1] )thenline_fin = 1 else line_fin = 0 // determine if word is valid dot_active= ((llu_en = = 1) AND (phi_llu_ready = = 1) AND (buf_emp = = 0)) //counter logic if (llu_go_pulse = = 1) then dot_cnt = 0 elsif((dot_active = = 1)AND (line_fin = = 1)) then dot_cnt = 0 elsif(dot_active = = 1) then dot_cnt = dot_cnt + 1 else dot_cnt = dot_cnt //calculate the word select bits bit_sel[5:0] := dot_cnt[5:0]

[4077] The dot generator also maintains a read buffer pointer which isincremented each time a 64-bit word is processed. The pointer is used toaddress the correct 64-bit dot data word within the DIU buffers. Thepointer is reset when llu_go_pulse is 1. Unlike the dot counter the readpointer is not reset each line but rounded up the nearest 256-bit word.This allows for more efficient use of the DIU buffers at line finish.

[4078] When the dot counter reaches the join point for the dot generator(join_point), it jumps to the next 256 bit word in the DIU buffer butcontinues to read from the next bit position within that word. If thejoin point coincides with a word boundary, no 256-bit increment isrequired. // read pointer logic if (llu_go_pulse = = 1) then read_adr =0 elsif ((dot_active = = 1)AND((dot_cnt[7:0] = = 255)OR(line_fin = =1)))then // end of line round up read_adr[3:2] ++ read_adr[1:0] = 0elsif ((dot_active = = 1)AND(dot_cnt = = join_point)AND(dot_cnt[5:0] = =63)) then // join point jump 256 bits read_adr[1:0] ++ // regularincrement read_adr[3:2] ++ // join point 256 increment elsif((dot_active = = 1)AND(dot_cnt = = join_point)AND(dot_cnt[5:0] != 63))then // join point jump 256 bits, bottom bits remain the sameread_adr[3:2] ++ // join point 256 increment only elsif ((dot_active = =1)AND(dot_cnt[5:0] = = 63)) then read_adr[3:0] ++ // regular increment

[4079] 31.7.5 Fifo Fill Level

[4080] The LLU keeps a running total of the number of lines in the dotline store FIFO. Every time the DWU signals a line end (dwu_llu_line_wractive pulse) it increments the filllevel. Conversely if the LLU detectsa line end (line_rd pulse) the filllevel is decremented and the lineread is signalled to the DWU via the llu_dwu_line_rd signal.

[4081] The LLU fill level block is used to determine when the dot linehas enough data stored before the LLU should begin to start reading. TheLLU at page start is disabled. It waits for the DWU to write lines tothe dot line FIFO, and for the fill level to increase. The LLU remainsdisabled until the fill level has reached the programmed threshold (fiforead thres). When the threshold is reached it signals the LLU to startprocessing the page by setting llu_en high. Once the LLU has startedprocessing dot data for a page it will not stop if the filllevel fallsbelow the threshold, but will stall is filllevel falls to zero.

[4082] The line fifo filllevel can be read by the CPU via the PCU at anytime by accessing the FifoFillLevel register. The CPU must toggle the Goregister in the LLU for the block to be correctly initialized at pagestart and the fifo level reset to zero. if (llu_go_pulse = = 1) thenfilllevel = 0 elsif ((line_rd = = 1) AND (dwu_llu_line_wr = = 1)) then// do nothing elsif (line_rd = = 1) then filllevel −− elsif(dwu_llu_line_wr = = 1) then filllevel ++ // determine the threshold,and set the LLU going if (llu_go_pulse = = 1) OR (filllevel = = 0 ))then llu_en = 0 elsif (filllevel = = fifo_read_threshold ) then llu_en =1

[4083] 31.7.6 DIU Interface

[4084] 31.7.6.1 DIU Interface Description

[4085] The DIU interface block is responsible for determining when dotdata needs to be read from DRAM, keeping the dot generators suppliedwith data and calculating the DRAM read address based on configuredparameters, FIFO fill levels and position in a line.

[4086] The fill level block enables DIU requests by activating llu_ensignal. The DIU interface controller then issues requests to the DIU forthe LLU buffers to be filled with dot line data (or fill the LLU bufferswith null data without requesting DRAM access, if required).

[4087] At page start the DIU interface determines which buffers shouldbe filled with null data and which should request DRAM access. Newrequests are issued until the dot line is completely read from DRAM.

[4088] For each request to the DRAM the address generator calculateswhere in the DRAM the dot data should be read from. The color_enable busdetermines which colors are enabled, the interface never issues DRAMrequests for disabled colors.

[4089] 31.7.6.2 Interface Controller

[4090] The interface controller co-ordinates and issues requests fordata transfers from DRAM. The state machine waits in Idle state until itis enabled by the LLU controller (llu_en) and a request for datatransfer is received from the write pointer block.

[4091] When an active request is received (req_active equals 1) thestate machine jumps to the ColorSelect state to determine which colors(color_cnt) in the group need a data transfer. A group is defined as allodd colors or all even colors. If the color isn't enabled (color_enable)the count just increments, and no data is transferred. If the color isenabled, the state machine takes one of two options, either a null datatransfer or an actual data transfer from DRAM. A null data transferwrites zero data to the DIU buffer and does not issue a request to DRAM.

[4092] The state machine determines if a null transfer is required bychecking the color_start signal for that color.

[4093] If a null transfer is required the state machine doesn't need toissue a request to the DIU and so jumps directly to the data transferstates (Data0 to Data3). The machine clocks through the 4 states eachtime writing a null 64-bit data word to the buffer. Once complete thestate machine returns to the ColorSelect state to determine if furthertransfers are required.

[4094] If the color_start is active then a data transfer is required.The state machine jumps to the Request state and issue a request to theDIU controller for DRAM access by setting llu_diu_rreq high. The DIUresponds by acknowledging the request (diu_llu_rack equals 1) and thensending 4 64-bit words of data. The transition from Request to Data0state signals the address generator to update the address pointer(adr_update). The state machine clocks through Data0 to Data3 stateseach time writing the 64-bit data into the buffer selected by thereq_sel bus. Once complete the state machine returns to the ColorSelectstate to determine if further transfers are required.

[4095] When in the ColorSelect state and all data transfers for colorsin that group have been serviced (i.e. when color_cnt is 6) the statemachine will return to the Idle state. On transition it will update theword counter logic (word_dec) and enabled the request logic(req_update).

[4096] A reset or llu_go_pulse set to 1 will cause the state machine tojump directly to Idle. The controller will remain in Idle state until itis enabled by the LLU controller via the llu_en signal. This preventsthe DIU attempting the fill the DIU buffers before the dot line storeFIFO has filled over its threshold level. μ31.7.6.3 Color Activate

[4097] The color activate logic maintains an absolute line countindicating the line number currently being processed by the LLU. Thecounter is reset when the llu_go_pulse is 1 and incremented each time aline_rd pulse is received. The count value (line_cnt) is used todetermine when to start reading data for a color.

[4098] The count is implemented as follows: if ( llu_go_pulse = = 1)then line_cnt = 0 elsif ( line_rd = = 1) then line_cnt ++

[4099] The color activate logic compares line count with the relativeline value to determine when the LLU should start reading data from DRAMfor a particular half color. It signals the interface controller blockwhich colors are active for this dot line in a page (via the color_startbus). It is used by the interface controller to determine which DIUbuffers require null data.

[4100] Once the color_start bit for a color is set it cannot be clearedin the normal page processing process. The bits must be reset by the CPUat the end of a page by transitioning the Go bit and causing a pulse onthe llu_go_pulse signal.

[4101] Any color not enabled by the color_enable bus will never have itscolor_start bit set. for (i=0; i<12;i++){ if ( llu_go_pulse = = 1) thencol_on[i] = 0 elsif ( color_enable [i % 6] = = 1 ) then col_on[i] = 0elsif ( line_cnt = = color_rel_line[i]) then col_on[i] = 1 } // selecteither odd or even colors if ( odd_even_sel = = 1 ) then // odd selectedcolor_start [5:0] ={col_on[11],col_on[9],col_on[7],col_on[5],col_on[3],col_on[1 ]} else //even selected color_start [5:0] ={col_on[10],col_on[8],col_on[6],col_on[4],col_on[2],col_on[0 ]}

[4102] 31.7.6.4 Address Generator

[4103] The address generator block maintains 24 pointers(current_adr_a[11:0] and current_adr_b[11:0]) to DRAM corresponding to 2read addresses in the dot line FIFO for each half color. The current_adra group of pointers are used when the dot generator is feeding printheadchannel A, and the current_adr_b group of pointers are used when the dotgenerator is feeding printhead channel B. For each DRAM access the 2address pointers are updated but only one can be used for an access. Theword counter block determines which pointer group should be used toaccess DRAM, via the pointer select signals (ptr_sel). In certain cases(e.g. the join point is not 256-bit aligned and the word is on the joinpoint) the address pointers should not be updated for an access, theword counter block determines the exception cases and indicates to theaddress generator to skip the update via the join_stall signal.

[4104] When a DRAM transfer occurs the address pointer is used first andthen updated for the next transfer for the color. The pointer used isselected by the req_sel and ptr_sel buses, and the pointer update isinitiated by the adr_update signal from the interface controller.

[4105] The address update is calculated as follows (pointer group Alogic is shown but the same logic is used to update the B pointer groupa clock cycle later): // update the A pointers if (ptra_wr_en = = 1)then // write from the configuration block current_adr_a[ptr_adr] =ptr_wr_data; elsif ( adr_update_a = = 1) then { // address update fromstate machine if ((req_sel = = NULL )OR (join_stall = = 1)) then // donothing else // temporary variable setup next_adr = current_adr_a[req_sel] + 1 start_adr = color_base_adr[req_sel] end_adr =color_base_adr [req_sel + 1] // determine how to update the pointer if(next_adr = = end_adr) then current_adr_a[req_sel] = start_adr elsecurrent_adr_a[req_sel] = next_adr }

[4106] The correct address to use for a transfer is selected by theptr_sel signals from the word counter block. They indicate which set ofaddress pointers should be used based on the current word beingtransferred from the DRAM and the configured join point values(join_word). // select the address pointer to use for access if(req_sel[0] = = 1) then // odd pointer selector if (ptr_sel[1] = = 1)then llu_diu_radr = current_adr_b[req_sel] // latter part of line elsellu_diu_radr = current_adr_a[req_sel] // former part of line else //even pointer selector if (ptr_sel[0] = = 1) then llu_diu_radr =current_adr_b[req_sel] // latter part of line else llu_diu_radr =current_adr_a[req_sel] // former part of line

[4107] 31.7.6.5 Write Pointer

[4108] The write pointer logic maintains the buffer write addresspointers, determines when the DIU buffers need a data transfer andsignals when the DIU buffers are empty. The write pointer determines theaddress in the DIU buffer that the data should be transferred to.

[4109] The write pointer logic compares the read and write pointers ofeach DIU buffer to determine which buffers require data to betransferred from DRAM, and which buffers are empty (the buf_empsignals).

[4110] Buffers are grouped into odd and even buffers, if an odd bufferrequires DRAM access the odd_pend signals will be active, if an evenbuffer requires DRAM access the even_pend signals will be active. Ifboth odd and even buffers require DRAM access at exactly the same time,the even buffers will get serviced first. If a group of odd buffers arebeing serviced and an even buffer becomes pending, the odd group ofbuffers will be completed before the starting the even group, and viceversa.

[4111] If any buffer requires a DRAM transfer, the logic will indicateto the interface controller via the req_active signal, with theodd_even_sel signal determining which group of buffers get serviced. Theinterface controller will check the color_enable signal and issue DRAMtransfers for all enabled colors in a group. When the transfers arecomplete it tells the write pointer logic to update the request pendingvia req_update signal.

[4112] The req_sel[3:0] signal tells the address generator which bufferis being serviced, it is constructed from the odd_even_sel signal andthe color_cnt[2:0] bus from the interface controller. When data is beingtransferred to DRAM the word pointer and write pointer for thecorresponding buffer are updated. The req_sel determines which pointershould be incremented.

[4113] The write pointer logic operates the same way regardless ofwhether the transfer is null or not. // determine which buffers needupdates buf_emp[1:0] = 0 odd_pend = 0 even_pend = 0 if ( wr_adr[0][3:2]= = rd_adr[0][3:2] ) even_pend = 1 if ( wr_adr[1][3:2] = =rd_adr[1][3:2] ) odd_pend = 1 // determine if buffers are empty if((wr_adr[0][3:0] = = rd_adr[0][3:0])) then buf_emp[0] = 1 if((wr_adr[1][3:0] = = rd_adr [1][3:0])) then buf_emp[1] = 1 //fixed servicing order, only update when controller dictates so if(req_update = = 1) then { if (even_pend = = 1) then // even always firstodd_even_sel = 0 req_active = 1 elsif (odd_pend = = 1 ) then // thencheck odd odd_even_sel = 0 req_active = 1 else // nothing activeodd_even_sel = 0 req_active = 0 } // selected requestor req_sel[3:0] ={color_cnt[2:0],odd_even_sel} // concatentation

[4114] The write address pointer logic consists of 2 2-bit counters anda word select pointer. The counters are reset when llu_go_pulse is one.The word pointer (word_ptr) is common to all buffers and is used towrite 64-bit words into the DIU buffer. It is incremented when buf_rd_enis active.

[4115] When a group of buffers are updated the state machine incrementsthe write pointer (wr_ptr[odd_even_sel]) via the group_fin signal. Aconcatenation of the write pointer and the word pointer are use toconstruct the buffer write address. The write pointers are not reset atthe end of each line. // determine which pointer to update if(llu_go_pulse = = 1) then wr_ptr[1:0] = 0 word_ptr = 0 elsif (buf_rd_en= = 1) then word_ptr++ wr_en[req_sel] = 1 elsif (group_fin = 1 ) thenwr_ptr[odd_even_sel]++ // create the address from the write pointer andword pointer. wr_adr[odd_even_sel] = {wr_ptr[odd_even_sel],word_ptr} //concatenation

[4116] 31.7.6.6 Word Count

[4117] The word count logic maintains 2 counters to track the number ofwords transferred from DRAM per line, one counter for odd data, and onecounter for even. On receipt of a llu_go_pulse, the counters areinitialized to a join_word value (number of words to the join point forthat printhead channel) and the pointer select values to zero (ptr_sel).When a group of words are transferred to DRAM as indicated by theword_dec signal from the interface controller, the corresponding counteris decremented. The counter to decrement is indicated by theodd_even_sel signal from the write pointer block (even=0, odd=1).

[4118] When a counter is zero and the ptr_sel is zero, the counter isre-initialized to the second join_word value and ptr_sel is inverted.The counter continues to count down to zero each time a word_dec signalis received. When a counter is zero and the ptr_sel is one, it signalsthe end of a line (the last_wd signal) and initializes the counter tothe first join_point value for the next line transfer.

[4119] The ptr_sel signal is used in the address generator to select thecorrect address pointer to use for that particular access. // determinewhich counter to decrement if (llu_go_pulse = = 1) then word_cnt[0] =join_word[0] // even count ptr_sel[0]  = 0 // even generator starts withpointer A word_cnt[1] = join_word[1] // odd count ptr_sel[1] = 0 // oddgenerator starts with pointer A elsif (word_dec = = 1) then { // need todecrement one word counter if (odd_even_sel = = 0) then  // even counterupdate if (word_cnt[0] = = 0) then word_cnt [0] = join_word[ptr_sel[0]]// re-initialize pointer ptr_sel[0]  = ˜(ptr_sel[0]) if(ptr_sel[0]= = 1) then  // determine if this the last word last_wd = 1else word_cnt [0]  − −  // normal decrement else   // odd counter updateif (word_cnt[1] = = 0) then word_cnt[1]  = join_word[ptr_sel[1]] //re-initialize pointer ptr_sel[1]  = ˜(ptr_sel[1]) if (ptr_sel[1] = = 1)then  // determine if this the last word last_wd = 1 else word_cnt [1] − −  // normal decrement }

[4120] The word count logic also determines if the current word to betransferred is the join word, and if so it determines if it is alignedon a 256-bit boundary or not. If the join point is aligned to a boundarythere is no need to prevent the address counter from incrementing,otherwise the address pointers are stalled for that word transfer(join_stall). join_stall = (((ptr_sel[0] = = 0)AND (word_cnt[0] = =0)AND (join_point[0][7:0] != 0)) AND ((ptr_sel[1] = = 0)AND (word_cnt[1]= = 0)AND (join_point[1][7:0] != 0)))

[4121] The word count logic also determines when a complete line hasbeen read from DRAM, it then signals the fifo fill level logic in boththe LLU and DWU (via line_rd signal) that a complete line has been readby the LLU (llu_dwu_line_rd). // line finish logic if (llu_go_pulse= = 1) then line_fin = 0 line_rd = 0 elsif ((last_wd = = 1) AND(line_fin = = 0)) then line_fin = 1  // first group last_wd finish pulseline_rd = 0 elsif ((last_wd = = 1) AND (line_fin = = 1)) then line_fin =0 // second group last_wd finish pulse line_rd = 1 else line_fin =line_fin // stay the same line_rd = 0

[4122] 32 Printhead Interface (PHI)

[4123] 32.1 Overview

[4124] The Printhead interface (PHI) accepts dot data from the LLU andtransmits the dot data to the printhead, using the printhead interfacemechanism. The PHI generates the control and timing signals necessary toload and drive the bi-lithic printhead. The CPU determines the lineupdate rate to the printhead and adjusts the line sync frequency toproduce the maximum print speed to account for the printhead IC's sizeratio and inherent latencies in the syncing system across multipleSoPECs.

[4125] The PHI also needs to consider the order in which dot data isloaded in the printhead. This is dependent on the construction of theprinthead and the relative sizes of printhead ICs used to create theprinthead. See Bi-lithic Printhead Reference document for a completedescription of printhead types [10].

[4126] The printing process is a real-time process. Once the printingprocess has started, the next printline's data must be transferred tothe printhead before the next line sync pulse is received by theprinthead. Otherwise the printing process will terminate with a bufferunderrun error.

[4127] The PHI can be configured to drive a single printhead IC with orwithout synchronization to other SoPECs. For example the PHI could drivea single IC printhead (i.e. a printhead constucted with one IC only), ordual IC printhead with one SoPEC device driving each printhead IC.

[4128] The PHI interface provides a mechanism for the CPU to directlycontrol the PHI interface pins, allowing the CPU to access the bi-lithicprinthead to:

[4129] determine printhead temperature

[4130] test for and determine dead nozzles for each printhead IC

[4131] initialize each printhead IC

[4132] pre-heat each printhead IC

[4133]FIG. 277 shows a high level data flow diagram of the PHI incontext.

[4134] 32.2 Printhead Modes of Operation

[4135] The printhead has 8 different modes of operations (although somemodes are re-used). The mode of operation is defined by the state of theoutput pins phi_lsyncl and phi_readl and the internal printhead moderegister. The modes of operation are defined in Table 210. TABLE 210Printhead modes of operation Internal Name Mode phi_readl phi_lsynclState Description NORMAL XXX 1 1 N/A Normal print mode, dot data isclocked into the printhead shift register, on each falling edge ofphi_srclk DOT_LOAD/ XXX 1 0 phi_frclk=0 Dot Load Mode, data stored inthe FIRE_INIT dot shift register is transferred into the dot latch onthe falling edge of phi_lsyncl, and latched in on the rising edge ofphi_lsyncl phi_srclk=1 Fire load mode. Parameter for generating firepattern are loaded into generator, data on phi_ph_data[1:0][0] isclocked into the generator on each rising edge of phi_frclk NOZZLE_RESET001 0 1 N/A Reset Nozzle Test mode. Reset the state on nozzle test.CMOS_TEST 111 0 1 N/A CMOS test mode. FIRE_GEN 000 0 1 N/A FireInitialise mode. The initialised generator creates the fire pattern andshift select pattern. The pattern is clocked into the fire shiftregister and select shift register on the rising edge of phi_frclkTEMP_TEST 010 0 0 N/A Temperature test output. NOZZLE_TEST 001 0 0 N/ANozzle test output. The result of a nozzle test is output onphi_frclk_i.

[4136] 32.3 Data Rate Equalization

[4137] The LLU can generate dot data at the rate of 12 bits per cycle,where a cycle is at the system clock frequency. In order to achieve thetarget print rate of 30 sheets per minute, the printhead needs to printa line every 100 μs (calculated from 300 mm @ 65.2 dots/mm divided by 2seconds=˜100 μsec). For a 7:3 constructed printhead this means that 9744cycles at 320 Mhz is quick enough to transfer the 6-bit dot data (at 2bits per cycle). The input FIFOs are used to de-couple the read andwrite clock domains as well as provide for differences between consumeand fill rates of the PHI and LLU.

[4138] Nominally the system clock (pclk) is run at 160 Mhz and theprinthead interface clock (doclk) is at 320 Mhz.

[4139] If the PHI was to transfer data at the full printhead interfacerate, the transfer of data to the shorter printhead IC would becompleted sooner than the longer printhead IC. While in itself thisisn't an issue it requires that the LLU be able to supply data at themaximum rate for short duration, this requires uneven bursty access toDRAM which is undesirable. To smooth the LLU DRAM access requirementsover time the PHI transfers dot data to the printhead at apre-programmed rate, proportional to the ratio of the shorter to longerprinthead ICs.

[4140] The printhead data rate equalization is controlled byPrintheadRate[1:0] registers (one per printhead IC). The register is a16 bit bitmap of active clock cycles in a 16 clock cycle window. Forexample if the register is set to 0xFFFF then the output rate to theprinthead will be full rate, if it's set to 0xF0F0 then the output rateis 50% where there is 4 active cycles followed by 4 inactive cycles andso on. If the register was set to 0x0000 the rate would be 0%. Therelative data transfer rate of the printhead can be varied from 0-100%with a granularity of {fraction (1/16)} steps. TABLE 211 Example rateequalization values for common printheads Printhead Printhead APrinthead B Ratio A:B rate (%) rate (%) 8:2 0xFFFF (100%) 0x1111 (25%)7:3 0xFFFF (100%) 0x5551 (43.7%) 6:4 0xFFFF (100%) 0xF1F2 (68.7%) 5:50xFFFF (100%) 0xFFFF (100%)

[4141] If both printhead ICs are the same size (e.g. a 5:5 printhead) itmay be desirable to reduce the data rate to both printhead ICs, toreduce the read bandwidth from the DRAM.

[4142] 32.4 Dot Generate and Transmit Order

[4143] Several printhead types and arrangements exists (see [10] forother arrangements). The PHI is capable of driving all possibleconfigurations, but for the purposes of simplicity only one arrangement(arrangement 1- see [10] for definition) is described in the followingexamples. The structure of the printhead ICs dictate the dot transmitorder to each printhead IC. The PHI accepts two streams of dot data fromthe LLU, one even stream the other odd. The PHI constructs the dottransmit order streams from the dot generate order received from theLLU. Each stream of data has already been arranged in increasing ordecreasing dot order sense by the DWU. The exact sense choice isdependent on the type of printhead ICs used to construct the printhead,but regardless of configuration the odd and even stream should be ofopposing sense. The dot transmit order is shown in FIG. 281. Dot data isshifted into the printhead in the direction of the arrow, so from thediagram (taking the type 0 printhead IC) even dot data is transferred inincreasing order to the mid point first (0, 2, 4, . . . , m−6, m−4,m−2), then odd dot data in decreasing order is transferred (m−1, m−3,m−5, . . . , 5, 3, 1). For the type 1 printhead IC the order isreversed, with odd dots in increasing order transmitted first, followedby even dot data in decreasing order. Note for any given color the oddand even dot data transferred to the printhead ICs are from differentdot lines, in the example in the diagram they are separated by 5 dotlines. Table 212 shows the transmit dot order for some common A4printheads. Different type printheads may have the sense reversed andmay have an odd before even transmit order or vice versa. TABLE 212Example printhead ICs, and dot data transmit order for A4 (13824 dots)page Size Dots Dot Order Type 0 Printhead IC 8 11160 0, 2, 4, 8 . . . ,5574, 5576, 5578 5579, 5577, 5575 . . . 7, 5, 3, 1 7 9744 0, 2, 4, 8 . .. , 4866, 4868, 4870 4871, 4869, 4867 . . . 7, 5, 3, 1 3 8328 0, 2, 4, 8. . . , 4158, 4160, 4162 4163, 4161, 4159 . . . 7, 5, 3, 1 5 6912 0, 2,4, 8 . . . , 3450, 3452, 3454 3455, 3453, 3451 . . . 7, 5, 3, 1 4 54960, 2, 4, 8 . . . , 2742, 2744, 2746 2847, 2845, 2843 . . . 7, 5, 3, 1 34080 0, 2, 4, 8 . . . , 2034, 2036, 2038 2039, 2037, 2035 . . . 7, 5, 3,1 2 2664 0, 2, 4, 8 . . . , 1326, 1328, 1330 1331, 1329, 1327 . . . 7,5, 3, 1 Type 1 Printhead IC 8 11160 13823, 13821, 13819 . . . , 1337,1335, 1333 1332, 1334, 1336 . . . 13818, 13820, 13822 7 9744 13823,13821, 13819 . . . , 2045, 2043, 2041 2040, 2042, 2044 . . . 13818,13820, 13822 6 8328 13823, 13821, 13819 . . . , 2853, 2851, 2849 2848,2850, 2852 . . . 13818, 13820, 13822 5 6912 13823, 13821, 13819 . . . ,3461, 3459, 3457 3456, 3458, 3460 . . . 13818, 13820, 13822 4 549613823, 13821, 13819 . . . , 4169, 4167, 4165 4164, 4166, 4168 . . .13818, 13820, 13822 3 4080 13823, 13821, 13819 . . . , 4877, 4875, 48734872, 4874, 4876 . . . 13818, 13820, 13822 2 2664 13823, 13821, 13819. .. , 5585, 5583, 5581 5580, 5582, 5584 . . . 13818, 13820, 13822

[4144] 32.4.1 Dual Printhead IC

[4145] The LLU contains 2 dot generator units. Each dot generator readsdot data from DRAM and generates a stream of dots in increasing ordecreasing order. A dot generator can be configured to produce odd oreven dot data streams, and the dot sense is also configurable. In FIG.281 the odd dot generator is configured to produce odd dot data indecreasing order and the even dot generator produces dot data inincreasing order. The LLU takes care of any vertical misalignmentbetween the 2 printhead ICs, presenting the PHI with the appropriatedata ready to be transmitted to the printhead.

[4146] In order to reconstruct the dot data streams from the generateorder to the transmit order, the connection between the generators andtransmitters needs to be switched at the mid point. At line start theodd dot generator feeds the type 1 printhead, and the even dot generatorfeeds the type 0 printhead. This continues until both printheads havereceived half the number of dots they require (defined as the midpoint). The mid point is calculated from the configured printhead sizeregisters (PrintheadSize). Once both printheads have reached the midpoint, the PHI switches the connections between the dot generators andthe printhead, so now the odd dot generator feeds the type 0 printheadand the even dot generator feeds the type 1 printhead. This continuesuntil the end of the line.

[4147] It is possible that both printheads will not be the same size andas a result one dot generator may reach the mid point before the other.In such cases the quicker dot generator is stalled until both dotgenerators reach the mid point, the connections are switched and bothdot generators are restarted.

[4148] Note that in the example shown in FIG. 281 the dot generatorscould generate an A4 line of data in 6912 cycles, but because of themismatch in the printhead IC sizes the transmit time takes 9744 cycles.

[4149] 32.4.2 Single Printhead IC

[4150] In some cases only one printhead IC may be connected to the PHI.In FIG. 282 the dot generate and transmit order is shown for a single ICprinthead of 9744 dots width. While the example shows the printhead ICconnected to channel A, either channel could be used. The LLU generatesodd and even dot streams as normal, it has no knowledge of the physicalprinthead configuration. The PHI is configured with the printhead size(PrintheadSize[1] register) for channel B set to zero and channel A isset to 9744.

[4151] Note that in the example shown in FIG. 283 the dot generatorscould generate an 7 inch line of data in 4872 cycles, but because theprinthead is using one IC, the transmit time takes 9744 cycles, the samespeed as an A4 line with a 7:3 printhead.

[4152] 32.4.3 Summary of Generate and Transmit Order Requirements

[4153] In order to support all the possible printhead arrangements, thePHI (in conjuction with the LLU/DWU) must be capable of re-ordering thebits according to the following criteria:

[4154] Be able to output the even or odd plane first.

[4155] Be able to output even and odd planes independently.

[4156] Be able to reverse the sequence in which the color planes of asingle dot are output to the printhead.

[4157] 32.5 Print Sequence

[4158] The PHI is responsible for accepting dot data streams from theLLU, restructuring the dot data sequence and transferring the dot datato each printhead within a line time (i.e before the next line sync).

[4159] Before a page can be printed the printhead ICs must beinitialized. The exact initialization sequence is configurationdependent, but will involve the fire pattern generation initializationand other optional steps. The initialization sequence is implemented insoftware.

[4160] Once the first line of data has been transferred to theprinthead, the PHI will interrupt the CPU by asserting thephi_icu_print_rdy signal. The interrupt can be optionally masked in theICU and the CPU can poll the signal via the PCU or the ICU. The CPU mustwait for a print ready signal in all printing SoPECs before startingprinting.

[4161] Once the CPU in the PrintMaster SoPEC is satisfied that printingshould start, it triggers the LineSyncMaster SoPEC by writing to thePrintStart register of all printing SoPECs. The transition of thePrintStart register in the LineSyncMaster SoPEC will trigger the startof lsyncl pulse generation. The PrintMaster and LineSyncMaster SoPEC arenot necessarily the same device, but often are the same. For a more indepth definition see section 12.1.1 Multi-SoPEC systems on page 105.

[4162] Writing a 1 to the PrintStart register enables the generation ofthe line sync in the LineSyncMaster which is in turn used to align allSoPECs in a multi-SoPEC system. All printhead signaling is aligned tothe line sync. The PrintStart is only used to align the first line syncin a page.

[4163] When a SoPEC receives a line sync pulse it means that the linepreviously transferred to the printhead is now printing, so the PHI canbegin to transfer the next line of data to the printhead. When thetransfer is complete the PHI will wait for the next line sync pulsebefore repeating the cycle. If a line sync arrives before a completeline is transferred to the printhead (i.e. a buffer error) the PHIgenerates a buffer underrun interrupt, and halts the block.

[4164] For each line in a page the PHI must transfer a full line of datato the printhead before the next line sync is generated or received.

[4165] 32.5.1 Sync Pulse Control

[4166] If the PHI is configured as the LineSyncMaster SoPEC it willstart generating line sync signals LsyncPre number of pclk cycles afterPrintStart register rising transition is detected. All other signals inthe PHI interface are referenced from the rising edge of phi_lsynclsignal.

[4167] If the SoPEC is in line sync slave mode it will receive a linesync pulse from the LineSyncMaster SoPEC through the phi_lsyncl pinwhich will be programmed into input mode. The phi_lsyncl input pin istreated as an asynchronous input and is passed through a de-glitchcircuit of programmable de-glitch duration (LsyncDeglitchCnt).

[4168] The phi_lsyncl will remain low for LsyncLow cycles, and then highfor LsyncHigh cycles. The phi_lsyncl profile is repeated until the pageis complete. The period of the phi_lsyncl is given by LsyncLow+LsyncHighcycles. Note that the LsyncPre value is only used to vary the timebetween the generation of the first phi_lsyncl and the PageStartindication from the CPU. See FIG. 284 for reference diagram.

[4169] If the SoPEC device is in line sync slave mode, the LsyncHighregister specifies the minimum allowed phi_lsyncl period. Any phi_lsynclpulses received before the LsyncHigh has expired will trigger a bufferunderrun error.

[4170] 32.5.2 Shift Register Signal Control

[4171] Once the PHI receives the line sync pulse, the sequence of datatransfer to the printhead begins. All PHI control signals are specifiedfrom the rising edge of the line sync.

[4172] The phi_srclk (and consequently phi_ph_data) is controlled by theSrclkPre, SrclkPost registers. The SrclkPre specifies the number of pclkcycles to wait before beginning to transfer data to the printhead. Oncedata transfer has started, the profile of the phi_srclk is controlled byPrintheadRate register and the status of the PHI input FIFO. For exampleit is possible that the input FIFO could empty and no data would betransferred to the printhead while the PHI was waiting. After all thedata for a printhead is transferred to the PHI, it counts SrclkPostnumber of pclk cycles. If a new phi_lsyncl falling edge arrives beforethe count is complete the PHI will generate a buffer underrun interrupt(phi_icu_underrun).

[4173] 32.5.3 Firing Sequence Signal Control

[4174] The profile of the phi_frclk pulses per line is determined by 4registers FrclkPre, FrclkLow, FrclkHigh, FrclkNum. The FrclkPre registerspecifies the number of cycles between line sync rising edge and thephi_frclk pulse high. It remains high for FrclkHigh cycles and then lowfor FrclkLow cycles. The number of pulses generated per line isdetermined by FrclkNum register. The total number of cycles required tocomplete a firing sequence should be less than the phi_lsyncl periodi.e. ((FrclkHigh+FrclkLow)*FrclkNum)+FrclkPre<(LsyncLow+LsyncHigh). Notethat when in CPU direct control mode (PrintheadCpuCtrl=1) andPrintheadCpuCtrlMode[x]=1, the frclk generator is triggered by thetransition of the FireGenSoftTrigger[0]0 bit from 0 to 1. FIG. 284details the timing parameters controlling the PHI. All timing parametersare measured in number of pclk cycles.

[4175] 32.5.4 Page Complete

[4176] The PHI counts the number of lines processed through theinterface. The line count is initialised to the PageLenLine anddecrements each time a line is processed. When the line count is zero itpulses the phi icu_page_finish signal. A pulse on thephi_icu_page_finish automatically resets the PHI Go register, and canoptionally cause an interrupt to the CPU. Should the page terminateabnormally, i.e. a buffer underrun, the Go register will be reset and aninterrupt generated.

[4177] 32.5.5 Line Sync Interrupt

[4178] The PHI will generate an-interrupt to the CPU after a predefinednumber of line syncs have occured. The number of line syncs to count isconfigured by the LineSyncInterrupt register. The interrupt can bedisabled by setting the register to zero.

[4179] 32.6 Dot Line Margin

[4180] The PHI block allows the generation of margins either side of thereceived page from the LLU block. This allows the page width used withinPEP blocks to differ from the physical printhead size.

[4181] This allows SoPEC to store data for a page minus the margins,resulting in less storage requirements in the shared DRAM and reducedmemory bandwidth requirements. The difference between the dot data linesize and the line length generated by the PHI is the dot line marginlength. There are two margins specified for any sheet, a margin perprinthead IC side.

[4182] The margin value is set by programming the DotMargin register perprinthead IC. It should be noted that the DotMargin register representshalf the width of the actual margin (either left or right margindepending on paper flow direction). For example, if the margin in dotsis 1 inch (1600 dots), then DotMargin should be set to 800. The reasonfor this is that the PHI only supports margin creation cases 1 and 3described below.

[4183] See example in FIG. 284.

[4184] In the example the margin for the type 0 printhead IC is set at100 dots (DotMargin==100), implying an actual margin of 200 dots.

[4185] If case one is used the PHI takes a total of 9744 phi_srclkcycles to load the dot data into the type 0 printhead. It also requires9744 dots of data from the LLU which in turn gets read from the DRAM. Inthis case the first 100 and last 100 dots would be zero but areprocessed though the SoPEC system consuming memory and DRAM bandwidth ateach step.

[4186] In case 2 the LLU no longer generates the margin dots, the PHIgenerates the zeroed out dots for the margining. The phi_srclk stillneeds to toggle 9744 times per line, although the LLU only needs togenerate 9544 dots giving the reduction in DRAM storage and associatedbandwidth.

[4187] The case 2 senario is not supported by the PHI because the sameeffect can be supported by means of case 1 and case 3.

[4188] If case 3 is used the benefits of case 2 are achieved, but thephi_srclk no longer needs to toggle the full 9744 clock cycles. Thephi_srclk cycles count can be reduced by the margin amount (in this case9744-100=9644 dots), and due to the reduction in phi_srclk cycles thephi_lsyncl period could also be reduced, increasing the line processingrate and consequently increasing print speed. Case 3 works by shiftingthe odd (or even) dots of a margin from line Y to become the even (orodd) dots of the margin for line Y-4, (Y-5 adjusted due to being printedone line later). This works for all lines with the exception of thefirst line where there has been no previous line to generate the zeroedout margin. This situation is handled by adding the line reset sequenceto the printhead initialization procedure, and is repeated between pagesof a document.

[4189] 32.7 Dot Counter

[4190] For each color the PHI keeps a dot usage count for each of thecolor planes (called AccumDotCount). If a dot is used in particularcolor plane the corresponding counter is incremented. Each counter is 32bits wide and saturates if not reset. A write to the DotCountSnapregister causes the AccumDotCount[N] values to be transferred to theDotCount[N] registers (where N is 5 to 0, one per color). TheAccumDotCount registers are cleared on value transfer.

[4191] The DotCount[N] registers can be written to or read from by theCPU at any time. On reset the counters are reset to zero.

[4192] The dot counter only counts dots that are passed from the LLUthrough the PHI to the printhead.

[4193] Any dots generated by direct CPU control of the PHI pins will notbe counted.

[4194] 32.8 CPU IO Control

[4195] The PHI interface provides a mechanism for the CPU to directlycontrol the PHI interface pins, allowing the CPU to access the bi-lithicprinthead:

[4196] Determine printhead temperature

[4197] Test for and determine dead nozzles for each printhead IC

[4198] Printhead IC initialization

[4199] Printhead pre-heat function

[4200] The CPU can gain direct control of the printhead interfaceconnections by setting the PrintheadCpuCtrl register to one. Onceenabled the printhead bits are driven directly by the PrintheadCpuOutcontrol register, where the values in the register are reflecteddirectly on the printhead pins and the status of the printhead inputpins can be read directly from the PrintheadCpuIn. The direction of pinsis controlled by programming PrintheadCpuDir register.

[4201] The register to pin mapping is as follows: TABLE 213 CPU controland status registers mapping to printhead interface Register Name bitsPrinthead pin PrintHeadCpuOut 0 phi_lsyncl_o 1 phi_frclk_o 2 Reserved4:3 phi_ph_data_o[0][1:0] 6:5 phi_ph_data_o[1][1:0] 8:7 phi_srclk[1:0] 9phi_readl PrintHeadCpuDir 0 phi_lsyncl_e direction control 1 - outputmode 0 - input mode 1 phi_frclk_e direction control 1 - output mode 0 -input mode 2 Reserved PrintHeadCpuIn 0 phi_lsyncl_i 1 phi_frclk_i 2Reserved

[4202] It is important to note that once in PrintheadCpuCtrl mode it isthe responsibility of the CPU to drive the printhead correctly and notcreate situations where the printhead could be destroyed such asactivating all nozzles together.

[4203] The phi_srclk is a double data rate clock (DDR) and as such willclock data on both edges in the printhead.

[4204] Note the following procedures are based on current printheadcapabilities, and are subject to change.

[4205] 32.9 Implementation

[4206] 32.9.1 Definitions of I/O TABLE 214 Printhead interface I/Odefinition Port name Pins I/O Description Clocks and Resets Pclk 1 InSystem Clock Doclk 1 In Data out clock (2× pclk) used to transfer datato printhead prst_n 1 In System reset, synchronous active low.Synchronous to pclk dorst_n 1 In System reset, synchronous active low.Synchronous to doclk General phi_icu_print_rdy 1 Out Indicates that thefirst line of data is transferred to the printhead Active high.phi_icu_page_finish 1 Out Indicates that data for a complete page hastransferred. Active high phi_icu_underrun 1 Out Indicates the PHI hasdetected a buffer underrun. Active high phi_icu_linesync_int 1 OutIndicates the PHI has detected LineSyncInterrupt number of line syncs.Debug debug_data_valid 1 In Output debug data valid to be muxed on tothe PHI pin debug_cntrl 1 In Control signal for the PHI to indicatewhether or not the debug data valid (and pclk) should be selected by thepin mux. Active high. LLU Interface llu_phi_data[1:0][5:0] 2 × 6 In DotData from LLU to the PHI, each bit is a color plane 5 down to 0. Bus 0 -Even dot data stream Bus 1 - Odd dot data stream Data is active whencorresponding bit is active in llu_phi_avail bus phi_llu_ready[1:0] 2Out Indicates that PHI is ready to accept data from the LLU 0 - Even dotdata stream 1 - Odd dot data stream llu_phi_avail[1:0] 2 In Indicatesvalid data present on corresponding llu_phi_data. 0 - Even dot datastream 1 - Odd dot data stream Printhead Interface phi_ph_data[1:0][1:0]2 × 2 Out Dot data output to printhead. Each bus to each printheadcontains 2 bits of data Bus 0 - Printhead channel A Bus 1 - Printheadchannel B phi_srclk[1:0] 2 Out Dot data shift clock used to clock inprinthead data, data is shifted on both edges of clock(i.e. double datarate DDR). Bus 0 - Printhead channel A Bus 1 - Printhead channel Bphi_readl 1 Out Common printhead mode control. Used in conjunction withphi_lsyncl to determine the printhead mode 0 - SoPEC receiving,printhead driving 1 - SoPEC driving, printhead receiving phi_frclk_o 1Out Common Fire pattern clock needs to toggle once per fire cyclephi_frclk_e 1 In phi_frclk_o output enable, when high phi_frclk_o pin isdriving phi_frclk_l 1 In phi_frclk_i input from printhead phi_lsyncl_o 1Out Capture dot data for next print line, output mode phi_lsyncl_e 1 Inphi_lsyncl output enable, when high phi_lsyncl pin is drivingphi_lsyncl_i 1 In Line Sync Pulse from Master SoPEC PCU Interfacepcu_phi_sel 1 In Block select from the PCU. When pcu_phi_sel is highboth pcu_adr and pcu_dataout are valid. pcu_rwn 1 In Commonread/not-write signal from the PCU. pcu_adr[7:2] 3 In PCU address bus.Only 6 bits are required to decode the address space for this block.pcu_dataout[31:0] 32 In Shared write data bus from the PCU. phi_pcu_rdy1 Out Ready signal to the PCU. When phi_pcu_rdy is high it indicates thelast cycle of the access. For a write cycle this means pcu_dataout hasbeen registered by the block and for a read cycle this means the data onphi_pcu_datain is valid. phi_pcu_datain[31:0] 32 Out Read data bus tothe PCU.

[4207] 32.9.2 PHI Sub-Block Partition

[4208] 32.9.3 Configuration Registers

[4209] The configuration registers in the PHI are programmed via the PCUinterface. Refer to section 21.8.2 on page 321 for a description of theprotocol and timing diagrams for reading and writing registers in thePHI. Note that since addresses in SoPEC are byte aligned and the PCUonly supports 32-bit register reads and writes, the lower 2 bits of thePCU address bus are not required to decode the address space for thePHI. When reading a register that is less than 32 bits wide zeros shouldbe returned on the upper unused bit(s) of phi_pcu_datain. Table 215lists the configuration registers in the PHI TABLE 215 PHI registersdescription AddressPHI_base+ Register #bits Reset Description ControlRegisters 0x00 Reset 1 0x1 Active low synchronous reset, self de-activating. A write to this register will cause a PHI block reset. 0x04Go 1 0x0 Active high bit indicating the PHI is programmed and ready touse. A low to high transition will cause PHI block internal state toreset. Will be automatically reset if a page finish or a buffer underrunis detected. General Control 0x08 PageLenLine 32 0x0000_0000 Specifiesthe number of dot lines in a page. Indicates the number of lines left toprocess in this page while the PHI is running (Working register) 0x0cPrintStart 1 0x0 A high level enables printing to start via thegeneration of line syncs in a master, and acceptance of line syncs in aslave. Can be set in advance of the print ready signal. 0x10-0x14DotMargin[1:0] 2 × 16 0x0000 Specifies for each printhead IC, the widthof the margin in dots divided by 2. Value must be divisible by 2 (i.e.the low bit must be 0) 0 - Printhead IC Channel A 1 - Printhead ICChannel B 0x18-0x2C DotCount[5:0] 6 × 32 0x0000_0000 Indicates thenumber of Dots used for a particular color, where N specifies a colorfrom 0 to 5. Value valid after a write access to DotCountSnap 0x30DotCountSnap 1 0x0 Write access causes the AccumDotCount values to betransferred to the DotCount registers. The AccumDotCount are resetafterwards.(Reads as zero) 0x34 PhiHeadSwap 1 0x0 Controls which signalsare connected to printhead channels A and B 0 - Normal, specifies bit 0is channel A, bit 1 is channel B 1 - Swapped, specifies bit 0 is channelB, bit 1 is channel A. 0x38 PhiMode 1 0x0 Indicates whether the PHI isoperating in master or slave mode 0 - Slave Mode 1 - Master Mode0x30-0x40 PhiSerialOrder 2 × 1 0x0 Specifies the serialization order ofdots before transfer to the printhead. Bus 0 - Printhead Channel A Bus1 - Printhead Channel B If set to zero the order is dot[1:0], thendot[3:2] then dot[5:4]. If set to one then the order is dot[5:4],dot[3:2], dot[1:0]. 0x44-0x48 PrintHeadSize 2 × 16 0x0000 Specifies thenumber of non-margin dots in the printhead ICs (must be even). Ifmargining is to be used then the configured PrintHeadSize should beadjusted by the dot margin value i.e. PrintHeadSize = (Physical-PrintHeadSize − (DotMargin * 2)). Value must be divisible by 2 (i.e. thelow bit must be 0) Bus 0 - Specifies printhead on Channel A Bus 1 -Specifies printhead on Channel B CPU Direct PHI Control (See Table 213.)0x4C PrintHeadCpuln 3 0x0 PHI interface pins input status. Only activein direct CPU mode (Read Only Register) 0x50 PrintHeadCpuDir 3 0x0 PHIinterface pins direction control. Only active in direct CPU mode 0x54PrintHeadCpuOut 10 0x000 PHI interface pins output control. Only activein direct CPU mode 0x58 PrintHeadCpuCtrl 1 0x1 Control direct access CPUaccess to the PHI pins 0 - Normal Mode 1 - Direct CPU Control mode 0x5CPrint-HeadCpuCtrlMode 1 0x0 Specifies if the pin is controlled by thePrintHeadCpuOut register or by the Fire generator logic. Only activewhen PrintHeadCpuCtrl is 1 and pin is in output mode. Bit 0 - controlsthe frclk pin When the bit is 0 - Pin is controlled by PrintHeadCpuOut1 - Pin is controlled by Fire Generator Logic Line Sync Control 0x60LsyncHigh 24 0x0_0000 In Master mode specifies the number of pclk cyclesphi_lsyncl should remain high. In Slave mode specifies the minimumnumber of pclk cycles between Lsync pulses. Lsync pulses of a shorterperiod will cause the PHI to halt due to buffer underrun. 0x64 LsyncLow16 0x0000 Number of pclk cycles phi_lsyncl should remain low. 0x68LsyncPre 16 0x0000 Number of pclk cycles between PrintStart risingtransition and the generated phi_lsyncl falling edge 0x6CLsyncDeglitchCnt 4 0x3 Number of pclk cycles to filter the incomingLsync pulse from the master. Only used in slave mode. 0x70LineSyncInterrupt 16 0x0000 Number of line syncs to occur beforegenerating an interrupt. When set to zero interrupt is disabled. ShiftRegister Control 0x74 SrclkPre 14 0x0000 Number of pclk cycles betweenphi_lsyncl falling edge and phi_srclk pulse generation, or printheaddata transfer 0x78 SrclkPost 14 0x0000 Number of pclk cycles allowedmargin from last srclk pulse in a line to before next line sync0x7C-0x80 PrintHeadRate[1:0] 2 × 16 0xFFFF Specifies the active toinactive ratio of phi_srclk for the printhead ICs. A 1 indicates Active.Bus 0 - Printhead IC channel A Bus 1 - Printhead IC channel B 0x84DotOrderMode 1 0x0 Specifies the dot transmit order to the printheadChannel A. Printhead Channel B is always the opposing order. 0 - Evenbefore Odd dots 1 - Odd before Even dots Fire Control 0x98 FrclkPre 140x0000 Number of pclk cycles after lsyncl transitions from 0 to 1 tophi_frclk pulse generation 0x9C FrclkLow 14 0x0000 Number of pclk cyclesphi_frclk should remain low. 0xA0 FrclkHigh 14 0x0000 Number of pclkcycles phi_frclk should remain high. 0xA4 FrclkNum 16 0x0000 Number ofphi_frclk pulses per line time. 0xA8 FireGenSoftTrigger 1 0x0 Onlyactive when PrintHeadCpuCtrlMode is set to 1, PrintHeadCpuCtrl is 1 andpin is in output mode. Bit 0 controls frclk generator. A 0 to 1transition on a bit triggers the corresponding generator to create theprogrammed pulse profile (configured by FrclkNum, FrclkHigh, FrclkLow,FrclkPre registers) when complete the bit gets reset to 0. WorkingRegisters 0xAC-0xB0 LineDotCnt 2 × 16 0x0000 Indicates the number of dotprocessed in the current line Bus 0 - Printhead Channel A Bus 1 -Printhead Channel B (Read Only Registers)

[4210] The configuration registers in the PHI block are clocked at pclkrates but some blocks in the PHI are clocked by different andasynchronous clocks. Configuration values are not re-synchronized, it istherefore important that the Go register be set to zero while updatingconfiguration values. This prevents logic from entering unknown statesdue to metastable clock domain transfers.

[4211] Some registers can be written to at any time such as the directCPU control registers (PrintheadCpuIn, PrintheadCpuDir, PrintheadCpuOutand PrintheadCpuCtrl), the Go register and the PrintStart register. Allregisters can be read from at any time.

[4212] 32.9.4 Dot Counter

[4213] The dot counter keeps a running count of the number of dots firedfor each color plane. The counters are 32 bits wide and will saturate.When the CPU wants to read the dot count for a particular color plane itmust write to the DotCountSnap register. This causes all 6 runningcounter values to be transferred to the DotCount registers in theconfiguration registers block. The running counter values are reset. //reset if being snapped if (dot_cnt_snap = = 1) then{ dot_count[5:0] =accum_dot_count[5:0] accum_dot_count[5:0] = 0 } // update the counts for(color=0;color < 6;color++) { if (accum_dot_count[color] != 0xffff_ffff){ // data valid, first dot stream data_valid  =  ((phi_llu_ready[0] = =  1) AND (llu_phi_avail[0] = = 1)) if ((data_valid = = 1) AND(llu_phi_data[0][color] = = 1)) then accum_dot_count[color] ++ // datavalid, second dot stream data_valid  =  ((phi_llu_ready[1]  = =  1)  AND(llu_phi_avail[1] = = 1)) if ((data_valid = = 1) AND (llu_phi_data[1][color] = = 1)) then accum_dot_count[color] ++ } }

[4214] 32.9.5 Sync Generator

[4215] The sync generator logic has two modes of operation, master andslave mode. In master mode (configured by the PhiMode register) itgenerates the lsyncl_o output based on configured values and controltriggers from the PHI controller. In slave mode it de-glitches theincoming lsyncl_i signal, and filters the lsyncl signal with the minimumconfigured period.

[4216] After reset or a pulse on phi_go_pulse the machine returns to theReset state, regardless of what state it's currently in.

[4217] The state machine waits until it's enabled (sync_en==1) by thePHI controller state machine. When enabled it can proceed to the SyncPreor SyncWait depending on whether the state machine is configured inmaster or slave mode. In master mode it generates the lsyncl pulses, inslave mode it receives and filters the lsyncl pulses from the mastersync generator.

[4218] On transition to the SyncPre state a counter is loaded with theLsyncPre value, and while in the SyncPre the counter is decremented.When the count is zero the machine proceeds to the SyncLow state loadingthe counter with LsyncLow value.

[4219] The machine waits in the SyncLow state until the counter hasdecremented to zero. It proceeds to the SyncHigh state pulsing theline_st signal on transition and counts LsyncHigh number of cycles. Thisindicates to the PHI controller the line start aligned to the lsynclpositive edge. While in LsyncLow state the lsyncl_o output is set to 0and in SyncHigh the lsyncl_o output is set to 1. When the count is zeroand the current line is not the last (last_line==0), the machine returnsto the SyncLow state to begin generating a new line sync pulse. Thetransition pulses the line_fin signal to the PHI controller.

[4220] The loop is repeated until the current line is the last(last_line==1), and the machine returns to the Reset state to wait forthe next page start.

[4221] In slave mode the state machine proceeds to the SyncWait statewhen enabled. It waits in this state until a-lsync_pulse_rise isreceived from the input de-glitch circuit. When a pulse is detected themachine jumps to the SyncPeriod state and begins counting down theLsyncHigh number of clock cycles before returning to the SyncWait state.Note in slave mode the LsyncHigh specifies the minimum number of pclkcycles between Lsync pulses. On transition from the SyncWait to theSyncPeriod state the line_st signal to the PHI controller is pulsed toindicate the line start. While in the SyncPeriod state if alsync_pulse_fall is detected the state machine will signal a sync error(via sync_err) to the PHI controller and cause a buffer underruninterrupt.

[4222] 32.9.5.1 Lsyncl Input De-Glitch

[4223] The lsync_i input is considered an asynchronous input to the PHI,and is passed through a synchronizer to reduce the possibility ofmetastable states occurring before being passed to the de-glitch logic.

[4224] The input de-glitch logic rejects input states of duration lessthan the configured number of clock cycles (lsync_deglitch_cnt), inputstates of greater duration are reflected on the output, and are negativeand positive edge detected to produce the lsync_pulse_fall andlsync_pulse_rise signal to the main generator state machine. The counterlogic is given by if ( lsync_i != lsync_i_delay) then cnt =lsync_deglitch_cnt output_en = 0 elsif (cnt = = 0 ) then cnt = cntoutput_en = 1 else cnt − − output_en = 0

[4225] 32.9.5.2 Line Sync Interrupt Logic

[4226] The line sync interrupt logic counts the number of line syncsthat occur (either internally or externally generated line syncs) anddetermines whether to generate an interrupt or not. The number of linesyncs it counts before an interrupt is generated is configured by theLineSyncInterrupt register. The interrupt is disabled ifLineSyncInterrupt is set to zero. // implement the interrupt counter if(phi_go_pulse = =1) then line_count = 0 elsif (line_st = = 1) AND(line_count = = 0)) then line_count = linecount_int elsif ((line_st= = 1) AND (line_count != 0)) then line_count − − // determine when topulse the interrupt if (linesync_int = = 0 ) then // interrupt disabledphi_icu_linesync_int = 0; elsif ((line_st = = 1) AND (line_count = = 1))then phi_icu_linesync_int = 1

[4227] 32.9.6 Fire Generator

[4228] The fire generator block creates the signal profile for thephi_frclk signal to the printhead. The frclk is based on configuredvalues and is timed in relation to the fire_st pulse from the PHIcontroller block. Should the phi_frclk state machine receive a fire_stpulse before it has completed the sequence the machine will restartregardless of its current state.

[4229] Alternatively the frclk state machine can be triggered togenerate their configured pulse profile by software. A low to hightransition on the FireGenSoftTrigger register will cause a pulse onsoft_frclk_st triggering the state machine to begin generating the pulseprofile. When the state machine has completed its sequence it will clearthe FireGenSoftTrigger register bit (via soft_fire_clr signal). TheFireGenSoftTrigger register will only be active when the printheadinterface is in CPU direct control mode (PrintheadCpuCtrl=1), the firegenerator is in software trigger mode (PrintheadCpuCtrlMode[x]=1) andthe pin is configured to be output mode (PrintheadCpuDir[x]=1).

[4230] The fire generator consists of a state machine for creating thephi_frclk signal. The phi_frclk signal is generated relative to thelsyncl signal.

[4231] The machine is reset to the Reset state when phi_go_pulse==1 orthe reset is active, regardless of the current state.

[4232] The machine waits in the reset state until it receives a fire_stpulse from the PHI controller (or an soft_fire_st from the configurationregisters). The controller will generate a fire_st pulse at thebeginning of each dot line. On the state transition the cycle counter isloaded with the FrclkPre value and the repeat counter is loaded with theFrclkNum value.

[4233] The state machine waits in the FirePre state until the cyclecounter is zero, after which it jumps to the FireHigh state and loadsthe cycle counter with FrclkHigh value. Again the state machine waitsuntil the count is zero and then proceeds to the FireLow state. Ontransition the cycle counter is loaded with the FireLow value. The statemachine waits in the FireLow state while the cycle counter isdecremented.

[4234] When the cycle counter reaches zero and the repeat_count isnon-zero, the repeat_count is decremented, the cycle counter is loadedwith the FrclkHigh value and the state machine jumps to the FireHighstate to repeat the phi_frclk generation cycle. The loop is repeateduntil the repeat_count is zero. In such cases_the state machine goes tothe reset state resetting FireGenSoftTrigger (via the soft_fire_clrsignal) register on the transition and waits for the next fire_st pulse.

[4235] When in the Reset state the fire_rdy signal is active to indicateto the controller that the fire generator is ready.

[4236] 32.9.7 PHI Controller

[4237] The PHI controller is responsible for controlling all functionsof the PHI block on a line by line basis. It controls and synchronizesthe sync generator, the fire generator, and datapath unit, as well assignalling back to the CPU the PHI status. It also contains a linecounter to determine when a full page has completed printing.

[4238] The PHI controller state machine is reset to Reset state by areset or phi_go_pulse==1.

[4239] It will remain in reset until the block is enabled by phi_go==1.Once enabled the state machine will jump to the FirstLine state, triggerthe transfer of one line of data to the printhead (data_st==1) and theline counter will be initialized to the page length (PageLenLine). Oncethe line is transferred (data_fin from the datapath unit) the machinewill go to PrintStart state and signal the CPU using an interrupt thatthe PHI is ready to begin printing (phi_icu_print_rdy). The line counterwill also be decremented. It will then wait in the PrintStart stateuntil the CPU acknowledges the print ready signal and enables printingby writing to the PrintStart register.

[4240] The state machine proceeds to the SyncWait state and waits for aline start condition (line_st==1). The line start condition is differentdepending on whether the PHI is configured as being in a master or slaveSoPEC (the PhiMode register). In either case the sync generatordetermines the correct line start source and signals the PHI controllervia the line_st signal. Once received the machine proceeds to theLineTrans state, with the transition triggering the fire generator tostart (fire_st), the datapath unit to start (data_st) and the syncgenerator to start (sync_st).

[4241] While in the LineTrans state the fire, sync and datapath unitwill be producing line data. When finished processing a line thedatapath unit will assert the line finished (data_fin) signal. If theline counter is not equal to 1 (i.e. not the last line) the statemachine will jump back to the SyncWait state and wait for the startcondition for the next line. The line counter will be decremented. Ifthe line counter is one then the machine will proceed to the LastLinestate.

[4242] The LastLine state generates one more line of fire pulses toprint the last line held in the shift registers of the printhead. Oncecomplete (fire_fin==1) the state machine returns to the reset state andwaits for the next page of data. On page completion the state machinegenerates a phi_icu_page_finish interrupt to signal to the CPU that thepage has completed, the phi_icu_page_finish will also cause the Goregister to reset automatically.

[4243] While the state machine is in the LineTrans state (or inFirstLine state and the PHI is in slave mode) and waiting for thedatapath unit to complete line processing, it is possible (e.g. anexcessive PEP stall) that a line finish condition occurs (line_fin==1)but the datapath unit is not ready. In this case an underrun error isgenerated. The state machine goes to the Underrun state and generates aphi_icu_underrun interrupt to the CPU. The PHI cannot recover from abuffer underrun error, the CPU must reset the PEP blocks and re-startprinting. The phi_icu_underrun will also cause the Go register to resetautomatically.

[4244] 32.9.8 CPU IO Control

[4245] The CPU IO control block is responsible for providing direct CPUcontrol of the IO pins via the configuration registers. It also acceptsthe input signals from the printhead and re-synchronizes them to thepclk domain, and debug signals from the RDU and muxes them to outputpins. Table contains the direct mapping of configuration registers toprinthead IO pins. Direct CPU control is enabled only whenPrintheadCpuCtrl is set to one. In normal operation (i.e.PrintheadCpuCtrl==0) the printhead frclk pin is always in output mode(phi_frclk_e=1), the phi_lsyncl will be in output if the SoPEC is themaster, i.e. phi_lsyncl_e=phi_mode, and readl will be set high.

[4246] The PrintheadCpuCtrlMode register determine whether the frclk pinshould be driven by the fire generator logic or direct from the CPUPrintheadCpuOut register.

[4247] The pseudocode for the CPU IO control is: if (printhead_cpu_ctrl= = 1) then  // CPU access enabled // outputs if(PrintHeadCpuCtrlMode[0] = = 1) then   // fire generator controlledphi_frclk_o = frclk else // normal direct CPU control phi_frclk_o =printhead_cpu_out [1] phi_ph_data_o[0][1:0]  = printhead_cpu_out[4:3]phi_ph_data_o[1][1:0]  = printhead_cpu_out[6:5] phi_srclk[1:0]  =printhead_cpu_out[8:7] phi_readl  = printhead_cpu_out [9] // directioncontrol phi_lsyncl_e  = printhead_cpu_dir[0] phi_frclk_e  =printhead_cpu_dir[1] // input assignments printhead_cpu_in[0]  =synchronize(phi_lsyncl_i) printhead_cpu_in[1]  = synchronize(phi_frclk_i) else // normal connections // outputsphi_ph_data_o[0][1:0] = ph_data[0][1:0] phi_ph_data_o[1][1:0] =ph_data[1][1:0] phi_lsyncl_o = lsync_o phi_readl = 1 phi_srclk[1:0] =srclk[1:0] phi_frclk_o = frclk // direction control phi_frclk_e = 1phi_lsyncl_e  = phi_mode // depends on Master or Slave mode // inputslsyncl_i = phi_lsync_i // connected regardless // debug overrides anyother connections if (debug_cntrl[0] = = 1) then phi_frclk_o =debug_data_valid phi_frclk_e = 1 phi_readl = pclk

[4248] The debug signalling is controlled by the RDU block (see Section11.8 Realtime Debug Unit (RDU)), the 10 control in the PHI muxes debugdata onto the PHI pins based on the control signals from the RDU.

[4249] 32.9.9 Datapath Unit

[4250] 32.9.10 Dot Order Controller

[4251] The dot order controller is responsible for controlling the dotorder blocks. It monitors the status of each block and determines theswitch over point, at which the connections from odd and even dotstreams to printhead channels are swapped.

[4252] The machine is reset to the Reset state when phi_go_pulse==1 orthe reset is active. The machine will wait until it receives a data stpulse from the PHI controller before proceeding to the LineStart state.On the transition to the LineStart state it will reset the dot counterin each dot order block via the dot_cnt_rst signal.

[4253] While in the LineStart state both dot order blocks are enabled(gen_en==1). The dot order blocks process data until each of them reachtheir mid point. The mid point of a line is defined by the configuredprinthead size (i.e. print_head_size). When a dot order block reachesthe mid point it immediately stops processing and waits for theremaining dot order block. When both dot order blocks are at the midpoint (mid_pt==11) the controller clocks through the LineMid state toallow the pipeline to empty and immediately goes to LineEnd state.

[4254] In the LineEnd state the mode_sel is switched and the dot orderblocks re-enabled, in this state the dot order blocks are reading datafrom the opposite LLU dot data stream as in LineStart state. Thecontroller remains in the LineEnd state until both dot order blocks haveprocessed a line i.e. line_fin==11.

[4255] On completion of both blocks the controller returns to the Resetstate and again awaits the next data_st pulse from the PHI controller.When in Reset state the machine signals the PHI controller that it'sready to begin processing dot data via the dot_order_rdy signal.

[4256] The dot order controller selects which dot streams should feedwhich printhead channels. The order can be changed by configuring theDotOrderMode register. In all cases Channel A and Channel B must be inopposing dot order modes. Table 216 shows the possible modes ofoperation. TABLE 216 Mode selection in Dot order controller. ChannelMode_sel DotOrderMode Dot transmit order A 0 0 Even before Odd (EBOmode), even dot stream feeds Channel A printhead, first half line. 0 1Odd before Even (OBE mode), odd dot stream feeds Channel A printhead,first half line. 1 0 Even before Odd (EBO mode), even dot stream feedsChannel A printhead, second half line. 1 1 Odd before Even (OBE mode),odd dot stream feeds Channel A printhead, second half line. B 0 0 Oddbefore Even (OBE mode), odd dot stream feeds Channel B printhead, secondhalf line 0 1 Even before Odd (EBO mode), even dot stream feeds ChannelB printhead, second half line. 1 0 Odd before Even (OBE mode), odd dotstream feeds Channel B printhead, first half line. 1 1 Even before Odd(EBO mode), even dot stream feeds Channel B printhead, first half line.

[4257] 32.9.10.1 Dot Order Unit

[4258] The dot order control accepts dot data from either dot streamfrom the LLU and writes the dot data into the dot buffer. It has twomodes of operation, odd before even (OBE) and even before odd (EBO). Inthe OBE mode data from the odd stream dot data is accepted first theneven, in EBO mode it's vice versa. The mode is configurable by theDotOrderMode register.

[4259] The dot order unit maintains a dot count that is decremented eachtime a new dot is received from the LLU. The dot order controller resetsthe dot counter to the print_head_size[15:0] at the start of a new linevia the dot_cnt_rst signal. The dot count is compared with the printheadsize (print_head_size[15:0] divided by 2) to determine the mid point(mid_Pt) and the line finish point (line_fin) when the dot counter iszero.

[4260] The mid point is defined as the half the number of dots in aparticular printhead, and is derived from the the print_head_size bus bydividing by 2 and rounding down. // define the mid point if(dot_cnt[15:0] = = print_head_size[15:1] )then mid_pt = 1 else mid_pt =0

[4261] The dot order unit logic maintains the dot data write pointer.Each time a new dot is written to the dot buffer the write pointer isincremented. The fill level of the dot buffer is determined by comparingthe read and write pointers. The fill level is used to determine when tobackpressure the LLU (ready signal) due to the dot buffer filling. Asuitable threshold value is determined to allow for the full LLUpipeline to empty into the dot buffer.

[4262] The dot order stalling control is given by: // determine theready/avail signal to use, based on mode select if (mode_sel = = 1) thendot_active = llu_phi_avail[0] AND ready wr_data = llu_phi_data[0] elsedot_active = llu_phi_avail[1] AND ready wr_data = llu_phi_data[1] //update the counters if (dot_active = = 1) then { wr_en = 1 wr_adr ++ if(dot_cnt = = 0) then dot_cnt = print_head_size else dot_cnt−− }

[4263] The dot writer needs to determine when to stall the LLU dot datastream. A number of factors could stall the dot stream in the LLU suchas buffer filling, waiting for the mid point, waiting for the linefinish or the dot order controller is waiting for the line startcondition from the PHI controller.

[4264] The stall logic is given by: // determine when to stall the LLUgenerator fill_level = wr_adr − rd_adr if (fill_level > (32 − THRESHOLD))then // THRESHOLD is open value ready = 0 // buffer is close to fullelsif ( gen_en = = 0) then ready = 0 // stalled by the datapathcontroller else ready = 1 // everything good no stall

[4265] 32.9.10.2 Data Generator

[4266] The data generator block reads data from the dot buffer and feedsdot data to the printhead at a configured rate (set by thePrintheadRate). It also generates the margin zero data and aligns thedot data generation to the synchronization pulse from the PHIcontroller.

[4267] The data generator controller waits in Reset state until itreceives a line start pulse from the PHI controller (data_st signal).Once a start pulse is received it proceeds to the SrclkPre state loadinga counter with the SrclkPre value. While in this state it decrements thecounter. No data is read or output at this stage. When the count is zerothe machine proceeds to the DataGen1 state.

[4268] On transition it loads the counter with the printhead size(print_head_size). If margining is to be used then the configured printhead_size should be adjusted by the dot margin value i.e.print_head_size=(physical_print_head_size−(dot_margin*2)).

[4269] Dot data is transferred to the printhead serializer in dot-pairs,with one dot-pair transferred every 3 pclk cycles. To construct a dotdata pair the state machine reads one dot in the DataGen1 state, one dotin the DataGen2 state and waits for one clock cycle in the DataGen3while the data is transferred to the data serializer. The counter willdecrement for every dot data word transferred. The exact data rate isdictated by the dot buffer fill levels and the configured printhead rate(PrintheadRate). When in DataGen3 state the machine determines if itshould waits for 3 cycles or transfer another dot pair to the dataserializer. The generator determines the rate by comparing the ratecounter (rate_cnt) with the configured PrintheadRate value. If the bitselected by the rate_cnt in the print_head_rate bus is one data istransferred, otherwise the 3 cycles are skipped (Wait1, Wait2 andWait3). If the PrintheadRate is set to all zeros then no data will everget transferred. The rate counter is decremented (rate_cnt) while in theDataGen2 and Wait2 states. The rate counter is allowed to wrap normally.

[4270] The pseudo-code for the rate control DataGen3 (or Wait3) state isgiven by: // decrement the rate count rate_cnt −− // happens inDataGen2, or Wait2 // determine if data should be read // firstdetermine if data is available in buffer if (rd_adr != wr_adr ) then if(print_head_rate[rate_cnt] = = 1 ) then dot_active = 1 gate_srclk = 1count −− next_state = DataGen1 else dot_active = 0 gate_srclk = 0next_state = Wait1 else dot_active = 0 gate_srclk = 0 next_state = Wait1

[4271] When the dot counter reaches zero the state machine will jump tothe MarginGen1 state if the configured margin value is non-zero,otherwise it will jump directly to the SrclkPost state. On transition toMarginGen1 state it loads the cycle counter with the dot_margin value,and begins to count down. While in the MarginGen1, MarginGen2 andMarginGen3 state machine loop the data generator logic block writes dotdata to the printhead but does not read from the dot buffers. It createszero dot data words for the margin duration. As with normal dot data, itcreates one dot in MarginGen1 and MarginGen2 states, then wait a clockcycle to allow the transfer to the data serializer to complete.

[4272] When the counter reaches zero the machine jumps to the SrclkPoststate, loads the clock counter with the SrclkPost value and decrements.When the count is finished the state machine returns to the Reset andawaits the next start pulse. Should a line sync arrive before the datagenerators have completed (data_fin signal) the PHI controller willdetect a print error and stall the PHI interface.

[4273] As a consequence of the data transfer mechanism of dot paircycles followed by a wait state, the printhead size (print_head_size)and dot margin (dot_margin) must always be even dot values.

[4274] 32.9.10.3 Data Serializer

[4275] The data serializer block converts 12-bit dot data at pclk rates(nominally 160 MHz) to 2-bit data at doclk rates (nominally 320 MHz).

[4276] The srclk is only active when data is available for transfer tothe printhead, as enabled by the gate_srclk signal. The data ratemechanism in the data generator block will mean that data is nottransferred to the printhead on every set of 3 pclk cycles. Both thedot_data and gate_srclk signals are controlled by the data generatorblock and can only change on a fixed 3 pclk cycle boundary. Data istransferred to the printhead on both edges of srclk (i.e double datarate DDR). Directly after a line sync pulse the mux control logic andthe srclk generation logic are reset to a known state (the srclk is sethigh). Before data can begin transfer to the printhead it must generatea line setup edge on srclk, causing srclk to go low. The line setup edgehappens SrclkPre number of pclk cycles after the line sync falling edge(indicated by the sr_init signal from the data generator block).

[4277] All data transfers to the printhead will be in groups of 6 2-bitdata words, each word clocked on an edge of srclk. For each group srclkwill start low and end low.

[4278] At the end of a full line of data transfer the srclk mustgenerate a line complete edge to return the srclk to a high state beforethe next line sync pulse. The data generator block generates a sr_cornsignal to indicate that the data transfer to the printhead has completedand that the line complete edge can be inserted. The sr_com signal isgenerated before the SrClkPost period.

[4279] The data serializer block allows easy separation of clock gatingand clock to logic structures from the rest of the PHI interface.

[4280] The mux logic determines which data bits from the dot_data busshould be selected for output on the ph_data bus to the printhead. Themux selector is initialized by an edge detect on the sr_init signal fromthe data generator. // determine wrap and init points if(phi_serial_order = = 1) then mux_wrap = 5 mux_init = 0 else mux_wrap =0 mux_init = 5 // the mux selector logic if ((sr_init_edge = = 1)OR(mux_sel = = mux_wrap )) then mux_sel = mux_init elsif ( phi_serial_order= = 1 ) then mux_sel−− // decrement order else mux sel++ // incrementorder

[4281] The dot data serialization order can be configured byPhiSerialOrder register. If the PhiSerialOrder is zero the order isdot[1:0], then dot[3:2] then dot[5:4]. If the register is one then theorder is dot[5:4], dot[3:2], dot[1:0].

[4282] The srclk control logic is initialized to 1 when a line_stpositive edge is detected. If either sr_com_edge, sr_init_edge orgate_srclk are equal to one srclk is transitioned. srclk is alwaysclocked out to the output pins on the negative edge of doclk to placethe clock edge in the centre of the data.

[4283] The pseudo code for the control logic is: if (line_st_edge = =1 )then srclk_gen = 1 elsif ((gate_srclk = =1) OR (sr_init_edge= =1) OR(sr_com_edge= =1)) then srclk_gen = ˜srclk_gen else // hold

[4284] 33 Package and Test

[4285] Test Units

[4286] 33.1 JTAG Interface

[4287] A standard JTAG (Joint Test Action Group) Interface is includedin SoPEC for Bonding and 10 testing purposes. The JTAG port will provideaccess to all internal BIST (Built In Self Test) structures.

[4288] 33.2 Scan Test I/O

[4289] The SoPEC device will require several test IO's for running scantests. In general scan in and scan out pins will be multiplexed withfunctional pins.

[4290] 33.3 Analog Test Units

[4291] 33.3.1 USB PHY Testing

[4292] The USB phy analog macro, will contain built-in in teststructure, which can be access by either the CPU or through the JTAGport.

[4293] 33.3.2 Embedded PLL Testing

[4294] The embedded clock generator PLL will require test access fromJTAG port.

[4295] 34 SoPEC Pinning and Package

[4296] 34.1 Overview

[4297] It is intended that the SoPEC package be a 100 pin LQFP. Anyspare pins in the package may be used by increasing the number ofavailable GPIO pins or adding extra power and ground pin. The pin listshows the minimum pin requirement for the SoPEC device. TABLE 217 SoPECPin List (100 LQFP) I/O Test Rate Freq Test Macro Group Pin Name #pin sDir Type Volt (S/D) (Mhz) Description IO Cell Type Function FunctionClocks and resets Group 1 Xtalin 1 I N/A N/A 32 Crystal AINSA_PM_A NoneInput pin Xtalout 1 O N/A N/A 32 Crystal ABNST_PM_A None output pinGroup 2 reset_n 1 I LVTTL 3.3 v s 10 Asynchronous IT33LTPUT_(—) LTactive low PM_A (leakage reset test) PrintHead Interface Group 3phead_(—) 8 O LVDS 1.5 v d 160 Print head OLVDS15_PM_A None data dataSrclk 4 O LVDS 1.5 v d 160 Print head OLVDS15_PM_A None clock Group 4Readl 1 O LVTTL 3.3 v s 160 Common BT3365T_PM_A A_Clock Print head modecontrol Frclk 1 I/O LVTTL 3.3 v s 160 Common BT3365T_PM_A B_Clock Firepattern shift clock, needs to toggle once per fire cycle phi_spare 1 I/OLVTTL 3.3 v s 160 PHI spare BT3365T_PM_A C_Clock1 pin (old profile pin)Lsyncl 1 I/O LVTTL 3.3 v s 160 Line Sync BT3365T_PM_A C_Clock2 outputfrom Master to Slaves USB Connections Group 5 Usb_hostd 2 I/O Differ-3.3 v s 12 USB BUSB2_PM_A None ential differential data for hostUsb_devd 2 I/O Differ- 3.3 v s 12 USB BUSB2_PM_A None entialdifferential data for device Group 6 usbd_(—) 1 I LVTTL 3.3 v s 10 USBdevice BT3365T_PM_C 1 scan out vbus_(—) VBUS sense power sense usbd_(—)1 O LVTTL 3.3 v s 10 USB device BT3365T_PM_C 1 scan out pull_(—)termination up_en enable JTAG Group 7 Tdo 1 O LVTTL 3.3 v s 10 JTAG TestBT3365T_PM_A C_Clock3 data out port Tms 1 I LVTTL 3.3 v s 10 JTAG TestIT33RIT_PM_A RI mode select Tdi 1 I LVTTL 3.3 v s 10 JTAG TestIT33D1PUT_(—) DI1 data in port PM_A Tck 1 I LVTTL 3.3 v s 10 JTAG TestIT33D2PUT_(—) DI2 access port PM_A clock General Purpose IO Group 8 Gpio4 I/O LVTTL 3.3 v s 32 ISI BT3335PUT_(—) 4 Scanin [3:0] interface PM_Bpins/GPIO Group 9 Gpio 4 I/O High 3.3 v S 32 LED driver BT3365T_PM_C 4Scanin PCNT [7:4] Drive pins/ PROGSROM LVTTL general OSC purposeInput/Output Group Gpio 12 I/O LVTTL 3.3 v s 32 General BT3365PUT_(—) 2Scanin DIAGOUT 10 [19:8] purpose PM_B 10 Scanout (aka Input/OutputMRSTR0) Group Gpio 3 I/O LVTTL 3.3 v s 32 General BT3365PUT_(—) CE0_Scan11 [22:20] purpose PM_B TESTM3 Input/Output TSTN1 Group Gpio 10 I/OLVTTL 3.3 v s 32 Functional BT3365T_PM_C 6 Scanin 12 [31:23] Spare IOs 4scanout required for scan test Analog Power IO Group agnd 1 I Power N/AN/A N/A PLL analog AINSD3_PM_A None 13 gnd avdd 1 I Power N/A N/A N/APLL analog AINSD3_PM_A None vdd agnd 1 I Power N/A N/A N/A OscillatorAINSD_PM_A None analog gnd avdd 1 I Power N/A N/A N/A OscillatorAINSD_PM_A None analog vdd Test Only Pin Group TE 1 I CMOS 1.5 v N/A N/ATest Enable IC15TEPDT_(—) Test only 14 PM_A VPP 1 I CMOS 1.5 v N/A N/AFat Wire DRAMVPP_PM Test only Analog Receiver/D river for Embedded DRAMAnalog Inputs VWP 1 I CMOS 1.5 v N/A N/A Fat Wire DRAMVWP_PM Test onlyAnalog Receiver/D river for Embedded DRAM Analog Inputs VREFX 1 I CMOS1.5 v N/A N/A Fat Wire DRAMVREFX_(—) Test only Analog PM Receiver/Driver for Embedded DRAM Analog Inputs DLT 1 I CMOS 1.5 v N/A N/A DRAMIC15DLTPUT_(—) Test only Iddq Test PM MC 1 I CMOS 1.5 v N/A N/A IO ModeIC15MCT_PM_A Test only Control DRAM_EN 1 I CMOS 1.5 v N/A N/A DRAMIC15LTPUT_(—) Test only Enable(EN) PM_A Total Signal Pins 73 Functionalpin count is 62 Test IO count 51 Power Only Pins Group Gnd 8 I Power N/AN/A N/A gnd GND_PM_A None 15 Vdd 4 I Power N/A N/A N/A vdd 1.5 v,VDD150_PM_A None core voltage vdd330 4 I Power N/A N/A N/A vdd 3.3 v,VDD330_PM_A None IO voltage Group vdd/gnd 11 I Power N/A N/A N/A Powerpin GND_PM_A/ None 15 fill, VDD150_P GND.Vdd1.5, M_A/ Vdd3.3 VDD330_PM_Aas required Total Pins 100

[4298] Bilithic Printheads

[4299] 1 Background

[4300] Silverbrook's bilithic Memjet™ printheads are the targetprintheads for printing systems which will be controlled by SoPEC andMoPEC devices.

[4301] This document presents the format and structure of theseprintheads, and describes the their possible arrangements in the targetsystems. It also defines a set of terms used to differentiate betweenthe types of printheads and the systems which use them.

[4302] Bilithic Printhead Configurations

[4303] 2 Definitions

[4304] This document presents terminology and definitions used todescribe the bilithic printhead systems. These terms and definitions areas follows:

[4305] Printhead Type—There are 3 parameters which define the type ofprinthead used in a system:

[4306] Direction of the data flow through the printhead (clockwise oranti-clockwise, with the printhead shooting ink down onto the page).

[4307] Location of the left-most dot (upper row or lower row, withrespect to V₊,).

[4308] Printhead footprint (type A or type B, characterized by the datapin being on the left or the right of V₊, where V₊ is at the top of theprinthead).

[4309] Printhead Arrangement—Even though there are 8 printhead types,each arrangement has to use a specific pairing of printheads, asdiscussed in Section 3. This gives 4 pairs of printheads. However,because the paper can flow in either direction with respect to theprintheads, there are a total of eight possible arrangements, e.g.Arrangement 1 has a Type 0 printhead on the left with respect to thepaper flow, and a Type 1 printhead on the right. Arrangement 2 uses thesame printhead pair as Arrangement 1, but the paper flows in theopposite direction.

[4310] Color 0 is always the first color plane encountered by the paper.

[4311] Dot 0 is defined as the nozzle which can print a dot in theleft-most side of the page.

[4312] The Even Plane of a color corresponds to the row of nozzles thatprints dot 0.

[4313] Note that in all of the relevant drawings, printheads should beinterpreted as shooting ink down onto the page.

[4314]FIG. 295 shows the 8 different possible printhead types. Type 0 isidentical to the Right Printhead presented in FIG. 297 in [1], and Type1 is the same as the Left Printhead as defined in [1].

[4315] While the printheads shown in FIG. 295 look to be of equal width(having the same number of nozzles) it is important to remember that ina typical system, a pair of unequal sized printheads may be used.

[4316] 2.1 Combining Bilithic Printheads

[4317] Although the printheads can be physically joined in the mannershown in FIG. 296, it is preferable to provide an arrangment that allowsgreater spacing between the 2 printheads will be required for two mainreasons:

[4318] inaccuracies in the backetch

[4319] cheaper manufacturing cost due to decreasing the tolerancerequirements in sealing the ink reservoirs behind the printhead

[4320] Failing to account for these inaccuracies and tolerances can leadto misalignment of the nozzle rows both vertically and horizontally, asshown in FIG. 297.

[4321] An even row of color n on printhead A may be verticallymisaligned from the even row of color n on printhead B by some number ofdots e.g. in FIG. 297 this is shown to be 5 dots. And there can also behorizontal misalignment, in that the even row of color n printhead A isnot necessarily aligned with the even row of color n+1 on printhead A,e.g. in FIG. 297 this horizontal misalignment is 6 dots.

[4322] The resultant conceptual printhead definition, shown in FIG. 297has properties that are appropriately parameterized in SoPEC and MoPECto cater for this class of printheads.

[4323] The preferred printheads can be characterized by the followingfeatures:

[4324] All nozzle rows are the same length (although may be horizontallydisplaced some number of dots even within a color on a single printhead)

[4325] The nozzles for color n printhead A may not be printing on thesame line of the page as the nozzles for color n printhead B. In theexample shown in FIG. 297, there is a 5 dot displacement betweenadjacent rows of the printheads.

[4326] The exact shape of the join is an arbitrary shape although ismost likely to be sloping (if sloping, it could be sloping eitherdirection)

[4327] The maximum slope is 2 dots per row of nozzles

[4328] Although shift registers are provided in the printhead at the 2sides of the joined printhead, they do not drive nozzles—this means theprintable area is less than the actual shift registers, as highlightedby FIG. 298.

[4329] 2.2 Printhead Arrangements

[4330] Table 218 defines the printhead pairing and location of the eachprinthead type, with respect to the flow of paper, for the 8 possiblearrangements Printhead on Printhead on left side, right side, withrespect with respect Printhead to the flow to the flow Arrangement ofpaper of paper Arrangement 1 Type 0 Type 1 Arrangement 2 Type 1 Type 0Arrangement 3 Type 2 Type 3 Arrangement 4 Type 3 Type 2 Arrangement 5Type 4 Type 5 Arrangement 6 Type 5 Type 4 Arrangement 7 Type 6 Type 7Arrangement 8 Type 7 Type 6

[4331] 3 Bilithic Printhead Systems

[4332] When using the bilithic printheads, the position of the power/gndbars coupled with the physical footprint of the printheads mean that wemust use a specific pairing of printheads together for printing on thesame side of an A4 (or wider) page, e.g. we must always use a Type 0printhead with a Type 1 printhead etc.

[4333] While a given printing system can use any one of the eightpossible arrangements of printheads, this document only presents two ofthem, Arrangement 1 and Arrangement 2, for purposes of illustration.These two arrangements are discussed in subsequent sections of thisdocument. However, the other 6 possibilities also need to be considered.

[4334] The main difference between the two printhead arrangementsdiscussed in this document is the direction of the paper flow. Becauseof this, the dot data has to be loaded differently in Arrangement 1compared to Arrangement 2, in order to render the page correctly.

[4335] 3.1 EXAMPLE 1

Printhead Arrangement 1

[4336]FIG. 299 shows an Arrangement 1 printing setup, where the bilithicprintheads are arranged as follows:

[4337] The Type 0 printhead is on the left with respect to the directionof the paper flow.

[4338] The Type 1 printhead is on the right.

[4339] Table 219 lists the order in which the dot data needs to beloaded into the above printhead system, to ensure color 0-dot 0 appearson the left side of the printed page. TABLE 219 Order in which the evenand odd dots are loaded for printhead Arrangement 1 Type 0 printheadType 1 printhead Dot Sense when on the left when on the right Odd Loadedsecond in Loaded first in descending order. descending order. EvenLoaded first in Loaded second in ascending order. ascending order.

[4340]FIG. 300 shows how the dot data is demultiplexed within theprintheads.

[4341]FIG. 301 and FIG. 302 show the way in which the dot data needs tobe loaded into the printheads in Arrangement 1, to ensure that color0-dot 0 appears on the left side of the printed page. Note that no datais transferred to the printheads on the first and last edges of SrClk.

[4342] 3.2 EXAMPLE 2

Printhead Arrangement 2

[4343]FIG. 303 shows an Arrangement 2 printing setup, where the bilithicprintheads are arranged as follows:

[4344] The Type 1 printhead is on the left with respect to the directionof the paper flow.

[4345] The Type 0 printhead is on the right.

[4346] Table 220 lists the order in which the dot data needs to beloaded into the above printhead system, to ensure color 0-dot 0 appearson the left side of the printed page. TABLE 220 Order in which the evenand odd dots are loaded for printhead Arrangement 2 Type 0 printheadType 1 printhead Dot Sense when on the right when on the left Odd Loadedfirst in Loaded second in descending order. descending order. EvenLoaded second in Loaded first in ascending order. ascending order.

[4347]FIG. 304 shows how the dot data is demultiplexed within theprintheads.

[4348]FIG. 305 and FIG. 306 show the way in which the dot data needs tobe loaded into the printheads in Arrangement 2, to ensure that color0-dot 0 appears on the left side of the printed page.

[4349] Note that no data is transferred to the printheads on the firstand last edges of SrClk.

[4350] 4 Conclusions

[4351] Comparing the signalling diagrams for Arrangement 1 with thoseshown for Arrangement 2, it can be seen that the color/dot sequenceoutput for a printhead type in Arrangement 1 is the reverse of thesequence for same printhead in Arrangement 2 in terms of the order inwhich the color plane data is output, as well as whether even or odddata is output first. However, the order within a color plane remainsthe same, i.e. odd descending, even ascending.

[4352] From FIG. 307 and Table 221, it can be seen that the plane whichhas to be loaded first (i.e. even or odd) depends on the arrangement.Also, the order in which the dots have to be loaded (e.g. even ascendingor descending etc.) is dependent on the arrangement.

[4353] As well as having a mechanism to cope with the shape of the joinbetween the printheads, as discussed in Section 2.1, if the devicecontrolling the printheads can re-order the bits according to thefollowing criteria, then it should be able to operate in all thepossible printhead arrangements:

[4354] Be able to output the even or odd plane first.

[4355] Be able to output even and odd planes in either ascending ordescending order, independently.

[4356] Be able to reverse the sequence in which the color planes of asingle dot are output to the printhead. TABLE 221 Order in which evenand odd dots and planes are loaded into the various printheadarrangements Printhead Left side of Right side of Arrangement printedpage printed page Arrangement 1 Even ascending Odd descending loadedfirst loaded first Odd descending Even ascending loaded second loadedsecond Arrangement 2 Odd descending Even ascending loaded first loadedfirst Even ascending Odd descending loaded second loaded secondArrangement 3 Odd ascending Even descending loaded first loaded firstEven descending Odd ascending loaded second loaded second Arrangement 4Even descending Odd ascending loaded first loaded first Odd ascendingEven descending loaded second loaded second Arrangement 5 Odd ascendingEven descending loaded loaded first first Even descending Odd ascendingloaded loaded second second Arrangement 6 Even descending Odd ascendingloaded first loaded first Even descending loaded Odd ascending secondloaded second Arrangement 7 Even ascending Odd descending loaded firstloaded first Odd descending Even ascending loaded second loaded secondArrangement 8 Odd descending Even ascending loaded first loaded firstEven ascending Odd descending loaded second loaded second

[4357] CMOS Support on Bilithic Printhead

[4358] 1 Basic Requirements

[4359] To create a two part printhead, of A4/Letter portrait width toprint a page in 2 seconds. Matching Left/Right chips can be of differentlengths to make up this length facilitating increased wafer usage. theleft and right chips are to be imaged on an 8 inch wafer by “Stitching”reticle images.

[4360] The memjet nozzles have a horizontal pitch of 32 um, two rows ofnozzles are used for a single colour. These rows have a horizontaloffset of 16 um. This gives an effective dot pitch of 16 um, or 62.5dots per mm, or 1587.5 dots per inch, close enough to market as 1600dpi.

[4361] The first nozzle of the right chip should have a 32 um horizontaloffset from the final nozzle of the left chip for the same color row.There is no ink nozzle overlap (of the same colour) scheme employed.

[4362] 1.1 Power Supply

[4363] Vdd/Vpos and Ground supply is made through 30 um wide pads alongthe length of the chip using conductive adhesive to bus bar beside thechips. Vdd/Vpos is 3.3 Volts. (12V was considered for Vpos but routingof CMOS Vdd at 3.3V would be a problem over the length of the chips, butthis will be revisited).

[4364] 1.2 MEMS Cells

[4365] The preferred memjet device requires 180 nJ of energy to fire,with a pulse of current for 1 usec. Assuming 95% efficiency, thisrequires a 55 ohm actuator drawing 57.4 mA during this pulse.

[4366] 1.2.1 Issue!!!

[4367] For 1 pages per 2 second, or ˜300 mm*62.5 (dots/mm)/2 sec˜=10 kHzor 100 usec per line. With 1 usec fire pulse cycle, every 100th nozzleneeds to fire at the same time. We have 13824 nozzles across the page,so we fire 138 nozzles at a time.

[4368] 1.2.2 64 um Unit Cell Height

[4369] This cell would have 4 line spacing between the odd and evendots, and 8 line spacing between adjacent colours.

[4370] 1.2.3 80 um Unit Cell Height

[4371] This cell would have 5 line spacing between the odd and evendots, and 10 line spacing between adjacent colours.

[4372] 1.3 Versions

[4373] 1.3.1 6 Colour 1600 dpi with 64 um Unit Cell

[4374] Left and Right ChIP.

[4375] 1.3.2 6 Colour 1600 dpi with 80 um Unit Cell

[4376] Left and Right ChIP.

[4377] 1.3.3 4 Colour 800 dpi with 80 um Unit Cell

[4378] For camera application. Single nozzle row per colour.

[4379] 1.4 Air Supply

[4380] Air must be supplied to the MEMS region through holes in thechIP.

[4381] 2 Head Sizes

[4382] The combined heads have 13824 nozzles per colour totalling221.184 mm of print area. Enough to provide full breadth for A4 (210 mm)and Letter (8.5 inch or 215.9 mm). TABLE 1 Head Combinations Left HeadRight Head Stitch Nozzles Stitch Nozzles Parts per Colour Parts perColour 8 11160 2 2664 7 9744 3 4080 6 8328 4 5496 5 6912 5 6912 4 5496 68328 3 4080 7 9744 2 2664 8 11160

[4383] Nozzles per Colour is calculated as ((“Stitch Parts”−1)*118+104)*12. Nozzles per row is half this value. Most likely the 8:2head set will not be manufactured. The preferred wafer layout, managesto avoid this set, without any loses.

[4384] 3 Interface

[4385] Each print head has the same I/O signals (but the Left and Rightversions might have a different pin out). TABLE 2 I/O pins Max SpeedName I/O Function Common (MHz) Data[0-1] I Dot data for colours 0-5, No320 using Differential Signalling (DataL the complementary signal),colours[0-2] on Data[0], colour[3-5] on Data[1] DataL[0-1] Icomplementary signal of Data[0-1] SrClk I Dot data shift clock No¹ 320using Differential Signalling (SrClkL the complementary signal) SrClkL Icomplementary signal of SrClk ReadL I FrClk, Pr, LSyncL output Yes 1mode if signal mode bit is set FrClk I Fire pattern shift clock Yes 1 Onozzle test result Yes² (mode = 0b001), LsyncL = 0 CMOS testing (mode =0b111), LsyncL = 1 Pr I Pulse Profile for all colours Yes  1³ OTemperature Output Yes^(b) (mode = 0b010), LsyncL = 0 CMOS testing (mode= 0b111), LsyncL = 1 LsyncL I 0 - Capture dot data for Yes  0.1⁴ nextprint line O CMOS testing (mode = Yes^(b) 0b111), LsyncL = 1

[4386] Pins marked as common can be controlled by the same signal fromthe controller (SOPEC).

[4387] 3.1 Dot Firing

[4388] To fire a nozzle, three signals are needed. A dot data, a firesignal, and a profile signal. When all signals are high, the nozzle willfire.

[4389] The dot data is provide to the chip through a dot shift registerwith input Data[x], and clocked into the chip with SrClk. The dot datais multiplex on to the Data signals, as Dot[0-2] on Data[0], andDot[3-51 on Data[2]. After the dots are shifted into the dot shiftregister, this data is transfer into the dot latch, with a low pulse inLsyncL. The value in the dot latch forms the dot data used to fire thenozzle. The use of the dot latch allows the next line of data to beloaded into the dot shift register, at the same time the dot pattern inthe dot latch is been fired.

[4390] Across the top of a column of nozzles, containing 12 nozzles, 2of each colour (odd and even dots, 4 or 5 lines apart), is two fireregister bits and a select register bit. The fire registers forms thefire shift register that runs length of the chip and back again with oneregister bit in each direction flow. The select register forms theSelect Shift Register that runs the length of the chIP. The selectregister, selects which of the two fire registers is used to enablesthis column. A ‘0’ in this register selects the forward direction fireregister, and a ‘1’ selects the reverse direction fire register. Thisoutput of this block provides the fire signal for the column.

[4391] The third signal needed, the profile, is provide for all colourswith input Pr across the whole colour row at the same time (with aslight propagation delay per column).

[4392] 3.2 Dot Shift Register Orientation

[4393] The left side print head (chIP) and the right side print headthat form complete bi-lithic print head, have different nozzlearrangement with respect to the dot order mapping of the dot shiftregister to the dot position on the page.

[4394] With this mapping, the following data streams will need toprovided. Left Head Right Head Size n-m dot order m 7:3 97 44 [13822,13820, 13818, . . . , 40 80 [1, 3, 5, . . . , 4084, 4082, 4080,] line y+5 4075, 4077, 4079,] [4081, 4083, 4085, . . . , line y [4078, 4076,13819, 13821, 13823] line y 4074, . . . , 4, 2, 0 ] line y + 5 6:4 83 28[13822, 13820, 13818, . . . , 54 96 [1, 3, 5, . . . , 5500, 5498, 5496,]line y + 5 5491, 5493, 5495,] [5497, 5499, 5501, . . . , line y [5494,5492, 13819, 13821, 13823] line y 5490, . . . , 4, 2, 0 ] line y + 5 5:569 12 [13822, 13820, 13818, . . . , 69 12 [1, 3, 5, . . . , 6916, 6914,6912,] line y + 5 6907, 6909, 6911,] [6913, 6915, 6917, . . . , line y[6910, 6908, 13819, 13821, 13823] line y 6906, . . . , 4, 2, 0 ] liney + 5 4:6 54 96 [13822, 13820, 13818, . . . , 83 28 [1, 3, 5, . . . ,8332, 8330, 8328,] line y + 5 8323, 8325, 8327,] [8329, 8331, 8333, . .. , line y [8326, 8324, 13819, 13821, 13823] line y 8322, . . . , 4, 2,0 ] line y + 5 3:7 40 80 [13822, 13820, 13818, . . . , 97 44 [1, 3, 5, .. . , 9748, 9746, 9744,] line y + 5 9739, 9741, 9743,] [9745, 97447,9749, . . . , line y 9742, 9740, 13819, 13821, 13823] line y 9738, . . ., 4, 2, 0 ] line y + 5

[4395] The data needs to be multiplexed onto the data pins, such thatData[0] has {(C0, C1, C2), (C0, C1, C2) . . . } in the above order, andData[1] has {(C3, C4, C5), (C3, C4, C5) . . . }.

[4396]FIG. 311 shows the timing of data transfer during normal printingmode. Note SrClk has a default state of high and data is transferred onboth edges of SrClk. If there are L nozzles per colour, SrClk would haveL+2 edges, where the first and last edges do not transfer data.

[4397] Data requires a setup and hold about the both edges of SrClk.Data transfers starts on the first rising after LSyncL rising. SrClkdefault state is high and needs to return to high after the last data ofthe line. This means the first edge of SrClk (falling) after LSyncLrising, and the last edge of SrClk as it returns to the default state,no data is transferred to the print head. LSyncL rising requires setupto the first falling SrClk, and must stay high during the entire linedata transfer until after last rising SrClk.

[4398] 3.3 Fire Shift Register

[4399] The fire shift register controls the rate of nozzle fire. If theregister is full of ‘1’s then the you could print the entire print headin a single FrClk cycle, although electrical current limitations willprevent this happening in any reasonable implementation.

[4400] Ideally, a ‘1’ is shifted in to the fire shift register, in everyn^(th) position, and a ‘0’ in all other position. In this manner, aftern cycles of FrClk, the entire print head will be printed.

[4401] The fire shift register and select shift registers allow thegeneration of a horizontal print line that on close inspection would nothave a discontinuity of a “saw tooth” pattern, FIGS. 312a) & b) but a“sharks tooth” pattern of c).

[4402] This is done by firing 2 nozzles in every 2n group of nozzle atthe same time starting from the outer 2 nozzles working towards thecentre two (or the starting from the centre, and working towards theouter two) at the fire rate controlled by FrClk.

[4403] To achieve this fire pattern the fire shift register and selectshift register need to be set up as show in FIG. 313.

[4404] The pattern has shifted a ‘1’ into the fire shift register everyn^(th) positions (where n is usually is a minimum of about 100) and n‘1’s, followed n ‘0’s in the select shift register. At a start of aprint cycle, these patterns need to be aligned as above, with the 1000 .. . ” of a forward half of fire shift register, matching an n groupingof ‘1’ or ‘0’s in the select shift register. As well, with the “1000 . .. ” of a reverse half of the fire shift register, matching an n groupingof ‘1’ or ‘0’s in the select shift register. And to continue this printpattern across the butt ends of the chips, the select shift register ineach should end with a complete block of n ‘1’s (or ‘0’s).

[4405] Since the two chips can be of different lengths, initialisationof these patterns is an issue. This is solved by building initialisationcircuitry into chips. This circuit is controlled by two registers,nlen(14) and count(14) and b(1). These registers are loaded seriallythrough Data[0], while LSyncL is low, and ReadL is high with FrClk.

[4406] The scan order from input is b, n[13-0],c[0-13],color[5-0],mode[2-0] therefore b is shifted in last. The system color and moderegisters are unrelated to the Fire Shift Register, but are loaded atthe same time as this block. There function is described later. TABLE 4Head Combinations Initialisation for n = 100 count_(A) = count_(B) =Nozzle s Nozzle s nlen_((A&B)) = (L_(A)/2) mod rem = (L_(A) − L_(B) +rem) L_(B) L_(A) n − 1 n − 1 b_(A) b_(B) (L_(B)/2) mod n mod n − 1 40809744 99 71 0 0 40 3 5496 8328 99 63 0 0 48 79 6912 6912 99 55 0 0 56 55

[4407] The following table shows the values to programme the bi-lithichead pairs using a fire pattern length of 100. The calculation assumeshead ‘A’ is the longest head of the pair and once the registers areinitialised with LA FrClk cycles (ReadL=‘0’, LSyncL=‘1’). rem would bethe correct value for count_(B) if chip B was only clocked (FrClk) L_(B)times. But this chip will be over clocked L_(A)−L_(B) cycles. The valuesof b_(A) and b_(B) are either the same or inverse of each other. Theactually value does not matter. They need to be different from eachother if the select shift registers would end up with different valuesat the butt ends. If (L_(A)/2n) is even (and count_(A) is non zero),then the final run in ‘A’s select shift register will be !b_(A). If(L_(A)−L_(B)/2) mod n is even (and count_(b) is non zero) then the finalrun in ‘B’s select shift register will be !b_(B).

[4408] 3.4 System Registers

[4409] As describe above, the Fire Shift Register generation block, alsocontains some system registers. TABLE 5 System Registers Name SizeFunction Color 6 Each bit is an enable for the corresponding colour. Ifcolor[X] = 0, then Pr_(x) is 0 and SrClk_(x) is 0. If color[X] = 1, thenPr_(x) follows the Pr signal and SrClk_(x) is deserialised SrClk. Mode 3Mode[0] = 1, then FrClk pin is used as an output, internally the FrClksignal is set to 0 Mode[1] = 1, then Pr pin is used as an output,internally the Pr signal is set to 0 Mode[2] = 1, then LsyncL pin isused as an output, internally the LsyncL signal is set to 1

[4410] 3.5 Profile Pattern

[4411] A profile pattern is repeated at FrClk rate. It is expected to bea single pulse about 1 us long. But it could be a more complicatedseries of pulse. The actual pattern depends on the ink type.

[4412] The following figure show the external timing to print a line ofdata. In this example the line is printed in 8 cycles of FrClk.

[4413] 3.6 Interface Modes

[4414] The print head has eight different modes controlled by signalsReadL and LSyncL and system mode register. As seen in FIG. 318 with bothLSyncL and ReadL high, the chip in normal printing mode. Some of thesemodes can operate at the same time, but may interfere with the result ofthe other modes. TABLE 6 Print Head Modes Mode Reg- Internal ReadLLSyncL Function ister Mapping 1 1 Normal Print Mode 000 SrClk = SrClk/3(XXX) frclk = FrClk SelClk = 0 FsClk = FrClk Scan = 0 CoreScan = 0 X 0Dot Load Mode 000 Dot latches are open, (XXX) loaded with Dot shiftregisters, latch once LSyncL returns to 1 (this happens regardless ofReadL) Enables Dot Shift register to capture fire result. 1 0 Fire LoadMode 000 SrClk = X Data[0] will (XXX) frclk = X shift through mode,SelClk = X color, nlen, FsClk = FrClk count and b with Scan = 1 FrClkCoreScan = X 0 1 Reset Nozzle Test 001 SrClk = SrClk Resets the state ofFrClk = FrClk nozzle test circuit SelClk = FrClk FsClk = FrClk Scan = 0CoreScan = 1 0 1 CMOS testing mode 111 The contents of the dot shiftregisters are serial shifted out on LsyncL (colour0-1), FrClk(colour2-3), Pr (colour4-5) with SrClk 0 1 Fire Initialise mode 000 Thecontents of the (XX0) fire shift register and select shift register isgenerated with FrClk 0 0 Temperature Output 010 SrClk = X The series ofSigma frclk = 0 Delta output are SelClk = 0 clocked out on Pr FsClk = 0with FrClk. The Scan = 0 sum of these bits CoreScan = X represent thetemperature of the chip. 0 0 Nozzle Test Output 001 The result of anozzle test is output on FrClk.

[4415] 3.6.1 Printing

[4416]FIG. 318 shows show timing for normal printing. During thisaction, we drop out of Normal Print Mode, to Dot Load Mode between linetransfers. For printing to perform correctly, all other signals shouldbe stable.

[4417] 3.6.2 Initialising for Printing

[4418] To initialise for printing the fire shift registers and selectshift registers need to be setup into a state as shown in FIG. 318. Todo this the chips are put into Fire Load Mode and the values for nlen,count and b are serially shifted from Data[0] clocked by FrClk. As thetwo chip have separate Data line, and common FrClk, this happens at thesame time. Once this is done, mode is changed to Fire Initialise Mode,and further LA FrClk cycles are provided to both chips. During all theseoperation Pr should be low, to prevent unintentional firing for nozzles.

[4419] 3.6.3 Nozzle Testing

[4420] Nozzle testing is done by firing a single nozzle at a time andmonitoring the FrClk pin in the Nozzle Test Output mode.

[4421] Each nozzle has a test switch which closes when the nozzle isfired with an energy level greater than required for normal inkejection. All 12 switches in a nozzle column are connect in parallel tothe following circuit.

[4422] This circuit is initialised when ever LSyncL is high and ReadL islow (Reset Nozzle Test mode). This forces all “switch nodes” to low, andthe feedback through lower NOR gate will latches this value. With LSyncLlow and ReadL still low (Nozzle Test Output mode) the Testout of thefirst nozzle column is output on FrClk. If any switch is closed, theswitch node of this column will be pulled up, and will ripple through tothe output as transition from high to low.

[4423] Nozzle testing requires a setup phase in order to fire only onenozzle. There are many ways to achieve this. Simplest might be to load asingle colour with 101010 through the even nozzles, and 010101 . . . forthe odd nozzles (0's for all other colours), and set up a fire patternwith n=L_(A)/2. With this fire pattern only one nozzle will fire in eachPr pulse. After firing in Nozzle Test Output mode, a single FrClk willadvance to next nozzle, then Reset and Test. After L_(A)/2 cycles ofthis testing, a single SrClk will advance the dot shift registers tosetup the untested nozzles of this colour, and another L_(A)/2 cycles ofFrClk, Reset and Test will finished testing this colour. Then repeattest procedure for other colours.

[4424] 3.6.4 Temperature Output

[4425] This mode is not well defined yet. In this mode, Pr will output aseries of ones and zeros clocked by FrClk. After a (currently unknown)number of FrClk cycles the sum of this series will represent thetemperature of the chIP. Clocking frequency in this mode it expected tobe in the range 10 kHz-1 MHz.

[4426] The Frequency of FrClk and the number of cycles need to beprogrammable. Since this mode cycles FrClk, the result of fire shiftregister and select shift register would be changed, but in this modeFrClk is disabled to these circuit. So printing can resume withoutreinitialising.

[4427] 3.6.5 CMOS Testing

[4428] CMOS testing is a mode meant for chip testing before MEMS asadded to the chIP. This mode allows the dot shift register to be shiftedout on the LsyncL, FrClk and Pr pins. Much like the nozzle test mode,the nozzles are fired while LSyncL is low, but during the firing SrClkwill be pulsed, loading the dot shift register with the signal thatwould fire the nozzle. Once captured, the result can be shifted out.

[4429] The Dot Load Mode above violates normal printing procedure byfiring the nozzles (Pr) and modify the dot shift register (SrClk).

[4430] 4 Reticle Layout

[4431] To make long chips we need to stitch the CMOS (and MEMS) togetherby overlapping the reticle stepping field. The reticle will contain twoareas:

[4432] The top edge of Area 2, PAD END contains the pads that stitch onbottom edge of Area 1, CORE. Area 1 contains the core array of nozzlelogic. The top edge of Area 1 will stitch to the bottom edge of itself.Finally the bottom edge of Area 2, BUTT END will stitch to the top edgeof Area 1. The BUTT END to used to complete a feedback wiring and sealthe chIP.

[4433] The above region will then be exposed across a wafer bottom totop. Area 2, Area 1, Area 1 . . . , Area 2. Only the PAD END of Area 2needs to fit on the wafer. The final exposure of Area 2 only requiresthe BUTT END on the wafer.

[4434] 4.1 TSMC U-Frame Requirements.

[4435] TSMC will be building us frames 10 mm×0.23 mm which will beplaced either side of both Area 1 and Area 2.

[4436] TSMC requires 6 mm area for blading between the two exposurearea. This translates to 3 mm on the reticle, as some reticules are 2×size, while most are 5×, the worst case must be used.

[4437] Security Overview

[4438] 1 Introduction

[4439] A number of hardware, software and protocol solutions to securityissues have been developed. These range from authorization andencryption protocols for enabling secure communication between hardwareand software modules, to physical and electrical systems that protectthe integrity of integrated circuits and other hardware.

[4440] It should be understood that in many cases, principles describedwith reference to hardware such as integrated circuits (ie, chips) canbe implemented wholly or partly in software running on, for example, acomputer. Mixed systems in which software and hardware (andcombinations) embody various entities, modules and units can also beconstructed using may of these principles, particularly in relation toauthorization and authentication protocols. The particular extent towhich the principles described below can be translated to or fromhardware or software will be apparent to one skilled in the art, and sowill not always explicitly be explained.

[4441] It should also be understood that many of the techniquesdisclosed below have application to many fields other than printing.Some specific examples are described towards the end of thisdescription.

[4442] A “QA Chip” is a quality assurance chip can allows certainsecurity functions and protocols to be implemented. The preferred QAChip is described in some detail later in this specification.

[4443] 1.5 QA Chip Terminology

[4444] The Authentication Protocols documents [5] and [6] refer to QAChips by their function in particular protocols:

[4445] For authenticated reads in [5], ChipR is the QA Chip being readfrom, and ChipT is the QA Chip that identifies whether the data readfrom ChipR can be trusted. ChipR and ChipT are referred to as UntrustedQA Device and Trusted QA Device respectively in [6].

[4446] For replacement of keys in [5], ChipP is the QA Chip beingprogrammed with the new key, and ChipF is the factory QA Chip thatgenerates the message to program the new key. ChipF is referred to asthe Key Programmer QA Device in [6].

[4447] For upgrades of data in memory vectors in [5], ChipU is the QAChip being upgraded, and ChipS is the QA Chip that signs the upgradevalue. ChipS is referred to as the Value Upgrader QA Device andParameter Upgrader QA Device in [6].

[4448] Any given physical QA Chip will contain functionality that allowsit to operate as an entity in some number of these protocols.

[4449] Therefore, wherever the terms ChipR, ChipT, ChipP, ChipF, ChipUand ChipS are used in this document, they are referring to logicalentities involved in an authentication protocol as defined in [5] and[6].

[4450] Physical QA Chips are referred to by their location. For example,each ink cartridge may contain a QA Chip referred to as an INK_QA, withall INK_QA chips being on the same physical bus. In the same way, the QAChip inside the printer is referred to as PRINTER_QA, and will be on aseparate bus to the INK_QA chips.

[4451] 2 Requirements

[4452] 2.1 Security

[4453] When applied to a printing environment, the functional securityrequirements for the preferred embodiment are:

[4454] Code of QA chip owner or licensee co-existing safely with code ofauthorized OEMs

[4455] Chip owner/licensee operating parameters authentication

[4456] Parameters authentication for authorized OEMs

[4457] Ink usage authentication

[4458] Each of these is outlined in subsequent sections.

[4459] The authentication requirements imply that:

[4460] OEMs and end-users must not be able to replace or tamper with QAchip manufacturer/owner's program code or data

[4461] OEMs and end-users must not be able to perform unauthorizedactivities for example by calling chip manufacturer/owner's code

[4462] End-users must not be able to replace or tamper with OEM programcode or data

[4463] End-users must not be able to call unauthorized functions withinOEM program code

[4464] Manufacturer/owner's development program code must not be capableof running on all SoPECs.

[4465] OEMs must be able to test products at their highest upgradablestatus, yet not be able to ship them outside the terms of their license

[4466] OEMs and end-users must not be able to directly access the printengine pipeline (PEP) hardware, the LSS Master (for QA Chip access) orany other peripheral block with the exception of operating systempermitted GPIO pins and timers.

[4467] 2.1.1 QA Manufacturer/Owner Code and OEM Program Code Co-ExistingSafely

[4468] SoPEC includes a CPU that must run both manufacturer/ownerprogram code and OEM program code. The execution model envisaged forSoPEC is one where Manufacturer/owner program code forms an operatingsystem (O/S), providing services such as controlling the print enginepipeline, interfaces to communications channels etc. The OEM programcode must run in a form of user mode, protected from harming theManufacturer/owner program code. The OEM program code is permitted toobtain services by calling functions in the O/S, and the O/S may alsocall OEM code at specific times. For example, the OEM program code mayrequest that the O/S call an OEM interrupt service routine when aparticular GPIO pin is activated.

[4469] In addition, we may wish to permit the OEM code to directly callfunctions in Manufacturer/owner code with the same permissions as theOEM code. For example, the Manufacturer/owner code may provide SHA1 as aservice, and the OEM could call the SHA1 function, but execute thatfunction with OEM permissions and not Silverbook permissions.

[4470] A basic requirement then, for SoPEC, is a form of protectionmanagement, whereby Manufacturer/owner and OEM program code can co-existwithout the OEM program code damaging operations or services provided bythe Manufacturer/owner O/S. Since services rely on SoPEC peripherals(such as USB2 Host, LSS Master, Timers etc) access to these peripheralsshould also be restricted to Manufacturer/owner program code only.

[4471] 2.1.2 Manufacturer/Owner Operating Parameters Authentication

[4472] A particular OEM will be licensed to run a Print Engine with aparticular set of operating parameters (such as print speed or quality).The OEM and/or end-user can upgrade the operating license for a fee andthereby obtain an upgraded set of operating parameters.

[4473] Neither the OEM nor end-user should be able to upgrade theoperating parameters without paying the appropriate fee to upgrade thelicense. Similarly, neither the OEM nor end-user should be able tobypass the authentication mechanism via any program code on SoPEC. Thisimplies that OEMs and end-users must not be able to tamper with orreplace Manufacturer/owner program code or data, nor be able to callunauthorized functions within Manufacturer/owner program code.

[4474] However, the OEM must be capable of assembly-line testing thePrint Engine at the upgraded status before selling the Print Engine tothe end-user.

[4475] 2.1.3 OEM Operating Parameters Authentication

[4476] The OEM may provide operating parameters to the end-userindependent of the Manufacturer/owner operating parameters. For example,the OEM may want to sell a franking machine¹.

[4477] The end-user should not be able to upgrade the operatingparameters without paying the appropriate fee to the OEM. Similarly, theend-user should not be able to bypass the authentication mechanism viaany program code on SoPEC. This implies that end-users must not be ableto tamper with or replace OEM program code or data, as well as not beable to tamper with the PEP blocks or service-related peripherals.

[4478] 2.2 Acceptable Compromises

[4479] If an end user takes the time and energy to hack the print engineand thereby succeeds in upgrading the single print engine only, yet notbe able to use the same keys etc on another print engine, that is anacceptable security compromise. However it doesn't mean we have to makeit totally simple or cheap for the end-user to accomplish this.

[4480] Software-only attacks are the most dangerous, since they can betransmitted via the internet and have no perceived cost. Physicalmodification attacks are far less problematic, since most printer usersare not likely to want their print engine to be physically modified.This is even more true if the cost of the physical modification islikely to exceed the price of a legitimate upgrade.

[4481] 2.3 Implementation Constraints

[4482] Any solution to the requirements detailed in Section 2.1 shouldalso meet certain preferred implementation constraints. These are:

[4483] No flash memory inside SoPEC

[4484] SoPEC must be simple to verify

[4485] Manufacturer/owner program code must be updateable

[4486] OEM program code must be updateable

[4487] Must be bootable from activity on USB2

[4488] Must be bootable from an external ROM to allow stand-aloneprinter operation

[4489] No extra pins for assigning IDs to slave SoPECs

[4490] Cannot trust the comms channel to the QA Chip in the printer(PRINTER QA)

[4491] Cannot trust the comms channel to the QA Chip in the inkcartridges (INK_QA)

[4492] Cannot trust the USB comms channel

[4493] These constraints are detailed below.

[4494] 2.3.1 No Flash Memory Inside SoPEC

[4495] The preferred embodiment of SoPEC is intended to be implementedin 0.13 micron or smaller. Flash memory will not be available in any ofthe target processes being considered.

[4496] 2.3.2 SoPEC Must be Simple to Verify

[4497] All combinatorial logic and embedded program code within SoPECmust be verified before manufacture. Every increase in complexity ineither of these increases verification effort and increases risk.

[4498] 2.3.3 Manufacturer/Owner Program Code Must be Updateable

[4499] It is neither possible nor desirable to write a single completeoperating system that is:

[4500] verified completely (see Section 2.3.1)

[4501] correct for all possible future uses of SoPEC systems

[4502] finished in time for SoPEC manufacture

[4503] Therefore the complete Manufacturer/owner program code must notpermanently reside on SoPEC. It must be possible to update theManufacturer/owner program code as enhancements to functionality aremade and bug fixes are applied.

[4504] In the worst case, only new printers would receive the newfunctionality or bug fixes. In the best case, existing SoPEC users candownload new embedded code to enable functionality or bug fixes.Ideally, these same users would be obtaining these updates from the OEMwebsite or equivalent, and not require any interaction withManufacturer/owner.

[4505] 2.3.4 OEM Program Code Must be Updateable

[4506] Given that each OEM will be writing specific program code forprinters that have not yet been conceived, it is impossible for all OEMprogram code to be embedded in SoPEC at the ASIC manufacture stage.

[4507] Since flash memory is not available (see Section 2.3.1), OEMscannot store their program code in on-chip flash. While it istheoretically possible to store OEM program code in ROM on SoPEC, thiswould entail OEM-specific ASICs which would be prohibitively expensive.Therefore OEM program code cannot permanently reside on SoPEC.

[4508] Since OEM program code must be downloadable for SoPEC to execute,it should therefore be possible to update the OEM program code asenhancements to functionality are made and bug fixes are applied.

[4509] In the worst case, only new printers would receive the newfunctionality or bug fixes. In the best case, existing SoPEC users candownload new embedded code to enable functionality or bug fixes.Ideally, these same users would be obtaining these updates from the OEMwebsite or equivalent, and not require any interaction withManufacturer/owner.

[4510] 2.3.5 Must be Bootable From Activity on USB2

[4511] SoPEC can be placed in sleep mode to save power when printing isnot required. RAM is not preserved in sleep mode. Therefore any programcode and data in RAM will be lost. However, SoPEC must be capable ofbeing woken up by the host when it is time to print again.

[4512] In the case of a single SoPEC system, the host communicates withSoPEC via USB2. From SoPEC's point of view, it is activity on the USB2device port that signals the time to wake up.

[4513] In the case of a multi-SoPEC system, the host typicallycommunicates with the Master SoPEC chip (as above), and then the Masterrelays messages to other Slave SoPECs by sending data out USB2 hostport(s) and into the Slave SoPEC's device port. The net result is thatthe Slave SoPECs and the Master SoPEC all boot as a result of activityon the USB2 device port.

[4514] Therefore SoPEC must be capable of being woken up by activity onthe USB2 device port.

[4515] 2.3.6 Must be Bootable From an External ROM to Allow Stand-AlonePrinter Operation

[4516] SoPEC must also support the case where the printer is notconnected to a PC (or the PC is currently turned off), and a digitalcamera or equivalent is plugged into the SoPEC-based printer. In thiscase, the entire printing application needs to be present within thehardware of the printer. Since the Manufacturer/owner program code andOEM program code will vary depending on the application (see Section2.3.3 and Section 2.3.4), it is not possible to store the program inSoPEC's ROM.

[4517] Therefore SoPEC requires a means of booting from a non-PC host.It is possible that this could be accomplished by the OEM adding aUSB2-host chip to the printer and simulating the effect of a PC, andthereby download the program code. This solution requires the bootoperation to be based on USB2 activity (see Section 2.3.5). However thisis an unattractive solution since it adds microprocessor complexity andcomponent cost when only a ROM-equivalent was desired.

[4518] As a result SoPEC should ideally be able to boot from an externalROM of some kind. Note that booting from an external ROM means firstbooting from the internal ROM, and then downloading and authenticatingthe startup section of the program from the external ROM. This is notthe same as simply running program code in-situ within an external ROM,since one of the security requirements was that OEMs and end-users mustnot be able to replace or tamper with Manufacturer/owner program code ordata, i.e. we never want to blindly run code from an external ROM.

[4519] As an additional point, if SoPEC is in sleep mode, SoPEC must becapable of instigating the boot process due to activity on aprogrammable GPIO. e.g. a wake-up button. This would begin addition tothe standard power-on booting.

[4520] 2.3.7 No Extra Pins to Assign IDs to Slave SoPECs

[4521] In a single SoPEC system the host only sends data to the singleSoPEC. However in a multi-SoPEC system, each of the slaves needs to beuniquely identifiable in order to be able for the host to send data tothe correct slave.

[4522] Since there is no flash on board SoPEC (Section 2.3.1) we areunable to store a slave ID in each SoPEC. Moreover, any ROM in eachSoPEC will be identical.

[4523] It is possible to assign n pins to allow 2^(n) combinations ofIDs for slave SoPECs. However a design goal of SoPEC is to minimize pinsfor cost reasons, and this is particularly true of features only used inmulti-SoPEC systems.

[4524] The design constraint requirement is therefore to allow slaves tobe IDed via a method that does not require any extra pins. This impliesthat whatever boot mechanism that satisfies the security requirements ofSection 2.1 must also be able to assign IDs to slave SoPECs.

[4525] 2.3.8 Cannot Trust the Comms Channel to the QA Chip in thePrinter (PRINTER_QA)

[4526] If the printer operating parameters are stored in thenon-volatile memory of the Print Engine's on-board PRINTER_QA chip, bothManufacturer/owner and OEM program code cannot rely on the communicationchannel being secure. It is possible for an attacker to eavesdrop oncommunications to the PRINTER_QA chip, replace the PRINTER_QA chipand/or subvert the communications channel. It is also possible for thisto be true during manufacture of the circuit board containing the SoPECand the PRINTER_QA chIP.

[4527] 2.3.9 Cannot Trust the Comms Channel to the QA Chip in the InkCartridges (INK_QA)

[4528] The amount of ink remaining for a given ink cartridge is storedin the non-volatile memory of that ink cartridge's INK_QA chIP. BothManufacturer/owner and OEM program code cannot rely on the communicationchannel to the INK_QA being secure. It is possible for an attacker toeavesdrop on communications to the INK_QA chip, to replace the INK_QAchip and/or to subvert the communications channel. It is also possiblefor this to be true during manufacture of the consumable containing theINK_QA chIP.

[4529] 2.3.10 Cannot Trust the Inter-SoPEC Comms Channel (USB2)

[4530] In a multi-SoPEC system, or in a single-SoPEC system that has anon-USB2 connection to the host, a given SoPEC will receive its dataover a USB2 host port. It is quite possible for an end-user to insert achip that eavesdrops on and/or subverts the communications channel (forexample performs man-in-the-middle attacks).

[4531] 3 Proposed Solution

[4532] A proposed solution to the requirements of Section 2, can besummarised as:

[4533] Each SoPEC has a unique id

[4534] CPU with user/supervisor mode

[4535] Memory Management Unit

[4536] The unique id is not cached

[4537] Specific entry points in O/S

[4538] Boot procedure, including authentication of program code andoperating parameters

[4539] SoPEC physical identification

[4540] 3.1 Each SoPEC Has a Unique ID

[4541] Each SoPEC needs to contains a unique SoPEC_id of minimum size64-bits. This SoPEC_id is used to form a symmetric key unique to eachSoPEC: SoPEC_id_key. On SoPEC we make use of an additional 112-bit ECID²macro that has been programmed with a random number on a per-chip basis.Thus SoPEC_id is the 112-bit macro, and the SoPEC_id_key is a 160-bitresult obtained by SHA1(SoPEC_ID).

[4542] The verification of operating parameters and ink usage depends onSoPEC_id being difficult to determine. Difficult to determine means thatsomeone should not be able to determine the id via software, or byviewing the communications between chips on the board. If the SoPEC_idis available through running a test procedure on specific test pins onthe chip, then depending on the ease by which this can be done, it islikely to be acceptable.

[4543] It is important to note that in the proposed solution, compromiseof the SoPEC_id leads only to compromise of the operating parameters andink usage on this particular SoPEC. It does not compromise any otherSoPEC or all inks or operating parameters in general.

[4544] It is ideal that the SoPEC_id be random, although this isunlikely to occur on standard manufacture processes for ASICs. If the idis within a small range however, it will be able to be broken by bruteforce. This is why 32-bits is not sufficient protection.

[4545] 3.2 CPU with User/Supervisor Mode

[4546] SoPEC contains a CPU with direct hardware support for user andsupervisor modes. At present, the intended CPU is the LEON (a 32-bitprocessor with an instruction set according to the IEEE-1754 standard.The IEEE1754 standard is compatible with the SPARC V8 instruction set).

[4547] Manufacturer/owner (operating system) program code will run insupervisor mode, and all OEM program code will run in user mode.

[4548] 3.3 Memory Management Unit

[4549] SoPEC contains a Memory Management Unit (MMU) that limits accessto regions of DRAM by defining read, write and execute accesspermissions for supervisor and user mode. Program code running in usermode is subject to user mode permission settings, and program coderunning in supervisor mode is subject to supervisor mode settings.

[4550] A setting of 1 for a permission bit means that type of access(e.g. read, write, execute) is permitted. A setting of 0 for a readpermission bit means that that type of access is not permitted.

[4551] At reset and whenever SoPEC wakes up, the settings for all thepermission bits are 1 for all supervisor mode accesses, and 0 for alluser mode accesses. This means that supervisor mode program code mustexplicitly set user mode access to be permitted on a section of DRAM.

[4552] Access permission to all the non-valid address space should betrapped, regardless of user or supervisor mode, and regardless of theaccess being read, execute, or write.

[4553] Access permission to all of the valid non-DRAM address space (forexample the PEP blocks) is supervisor read/write access only (nosupervisor execute access, and user mode has no acccess at all) with theexception that certain GPIO and Timer registers can also be accessed byuser code. These registers will require bitwise access permissions. Eachperipheral block will determine how the access is restricted.

[4554] With respect to the DRAM and PEP subsystems of SoPEC, typicallywe would set user read/write/execute mode permissions to be 1/1/0 onlyin the region of memory that is used for OEM program data, 1/0/1 forregions of OEM program code, and 0/0/0 elsewhere (including the traptable). By contrast we would typically set supervisor moderead/write/execute permissions for this memory to be 1/1/0 (to avoidaccidentally executing user code in supervisor mode).

[4555] The SoPEC_id parameter (see Section 3.1) should only beaccessible in supervisor mode, and should only be stored and manipulatedin a region of memory that has no user mode access. μ3.4 Unique ID ISNot Cached

[4556] The unique SoPEC_id needs to be available to supervisor code andnot available to user code. This is taken care of by the MMU (Section3.3).

[4557] However the SoPEC_id must also not be accessable via the CPU'sdata cache or register windows. For example, if the user were to causean interrupt to occur at a particular point in the program executionwhen the SoPEC_id was being manipulated, it must not be possible for theuser program code to turn caching off and then access the SoPEC_idinside the data cache. This would bypass any MMU security.

[4558] The same must be true of register windows. It must not bepossible for user mode program code to read or modify register settingsin a supervisor program's register windows.

[4559] This means that at the least, the SoPEC_id itself must not becacheable. Likewise, any processed form of the SoPEC_id such as theSoPEC_id key (e.g. read into registers or calculated expected resultsfrom a QA_ChIP) should not be accessable by user program code.

[4560] 3.5 Specific Entry Points in O/S

[4561] Given that user mode program code cannot even call functions insupervisor code space, the question arises as how OEM programs canaccess functions, or request services. The implementation for thisdepends on the CPU.

[4562] On the LEON processor, the TRAP instruction allows programs toswitch between user and supervisor mode in a controlled way. The TRAPswitches between user and supervisor register sets, and calls a specificentry point in the supervisor code space in supervisor mode. The TRAPhandler dispatches the service request, and then returns to the callerin user mode.

[4563] Use of a command dispatcher allows the O/S to provide servicesthat filter access—e.g. a generalised print function will set PEPregisters appropriately and ensure QA Chip ink updates occur.

[4564] The LEON also allows supervisor mode code to call user mode codein user mode. There are a number of ways that this functionality can beimplemented. It is possible to call the user code without a trap, but toreturn to supervisor mode requires a trap (and associated latency).

[4565] 3.6 Boot Procedure

[4566] 3.6.1 Basic Premise

[4567] The intention is to load the Manufacturer/owner and OEM programcode into SoPEC's RAM, where it can be subsequently executed. The basicSoPEC therefore, must be capable of downloading program code. HoweverSoPEC must be able to guarantee that only authorized Manufacturer/ownerboot programs can be loaded, otherwise anyone could modify the O/S to doanything, and then load that—thereby bypassing the licensed operatingparameters.

[4568] We perform authentication of program code and data usingasymmetric (public-key) digital signatures and without using a QA ChIP.

[4569] Assuming we have already downloaded some data and a 160-bitsignature into eDRAM, the boot loader needs to perform the followingtasks:

[4570] perform SHA-1 on the downloaded data to calculate a digestlocalDigest

[4571] perform asymmetric decryption on the downloaded signature(160-bits) using an asymmetric public key to obtain authorizedDigest

[4572] If authorizedDigest is the PKCS#1 (patent free) form oflocalDigest, then the downloaded data is authorized (the signature musthave been signed with the asymmetric private key) and control can thenbe passed to the downloaded data

[4573] Asymmetric decryption is used instead of symmetric decryptionbecause the decrypting key must be held in SoPEC's ROM. If symmetricprivate keys are used, the ROM can be probed and the security iscompromised.

[4574] The procedure requires the following data item:

[4575] boot0key=an n-bit asymmetric public key

[4576] The procedure also requires the following two functions:

[4577] SHA-1=a function that performs SHA-1 on a range of memory andreturns a 160-bit digest

[4578] decrypt=a function that performs asymmetric decryption of amessage using the passed-in key

[4579] PKCS#1 form of localDigest is 2048-bits formatted as follows:bits 2047-2040=0x00, bits 2039-2032=0x01, bits 2031-288=0xFF..0xFF, bits287-160=0x003021300906052B0E03021A05000414, bits 159-0=localDigest. Formore information, see PKCS#1 v2.1 section 9.2

[4580] Assuming that all of these are available (e.g. in the boot ROM),boot loader 0 can be defined as in the following pseudocode:bootloader0(data, sig) localDigest

SHA-1(data) authorizedDigest

decrypt(sig, boot0key) expectedDigest = 0x00|0x01|0xFF..0xFF|0x003021300906052B0E03021A05000414 |localDigest) // “|” = concat If(authorizedDigest = = expectedDigest) jump to program code at data-startaddress// will never return Else // program code is unauthorized EndIf

[4581] The length of the key will depend on the asymmetric algorithmchosen. The key must provide the equivalent protection of the entire QAChip system—if the Manufacturer/owner O/S program code can be bypassed,then it is equivalent to the QA Chip keys being compromised. In fact itis worse because it would compromise Manufacturer/owner operatingparameters, OEM operating parameters, and ink authentication by softwaredownloaded off the net (e.g. from some hacker).

[4582] In the case of RSA, a 2048-bit key is required to match the160-bit symmetric-key security of the QA ChIP. In the case of ECDSA, akey length of 132 bits is likely to suffice. RSA is convenient becausethe patent (U.S. Pat. No. 4,405,829) expired in September 2000.

[4583] There is no advantage to storing multiple keys in SoPEC andhaving the external message choose which key to validate against,because a compromise of any key allows the external user to alwaysselect that key.

[4584] There is also no particular advantage to having the bootmechanism select the key (e.g. one for USB-based booting and one forexternal ROM booting) a compromise of the external ROM booting key isenough to compromise all the SoPEC systems.

[4585] However, there are advantages in having multiple keys present inthe boot ROM and having a wire-bonding option on the pads select whichof the keys is to be used. Ideally, the pads would be connected withinthe package, and the selection is not available via external means oncethe die has ben packaged. This means we can have different keys fordifferent application areas (e.g. different uses of the chIP), and ifany particular SoPEC key is compromised, the die could be kept constantand only the bonding changed. Note that in the worst case of all keysbeing compromised, it may be economically feasible to change theboot0key value in SoPEC's ROM, since this is only a single mask change,and would be easy to verify and characterize.

[4586] Therefore the entire security of SoPEC is based on keeping theasymmetric private key paired to boot0key secure. The entire security ofSoPEC is also based on keeping the program that signs (i.e. authorizes)datasets using the asymmetric private key paired to boot0key secure.

[4587] It may therefore be reasonable to have multiple signatures (andhence multiple signature programs) to reduce the chance of a singlepoint of weakness by a rogue employee. Note that the authentication timeincreases linearly with the number of signatures, and requires a2048-bit public key in ROM for each signature.

[4588] 3.6.2 Hierarchies of Authentication

[4589] Given that test programs, evaluation programs, andManufacturer/owner O/S code needs to be written and tested, and OEMprogram code etc. also needs to be tested, it is not secure to have asingle authentication of a monolithic dataset combiningManufacturer/owner O/S, non-O/S, and OEM program code—we certainly don'twant OEMs signing Manufacturer/owner program code, andManufacturer/owner shouldn't have to be involved with the signing of OEMprogram code.

[4590] Therefore we require differing levels of authentication andtherefore a number of keys, although the procedure for authentication isidentical to the first—a section of program code contains the key andprocedure for authenticating the next.

[4591] This method allows for any hierarchy of authentication, based ona root key of boot0key. For example, assume that we have the followingentities:

[4592] QACo, Manufacturer/owner's QA/key company. Knows private versionof boot0key, and owner of security concerns.

[4593] SoPECCo, Manufacturer/owner's SoPEC hardware/software company.Supplies SoPEC ASICs and SoPEC O/S printing software to a ComCo.

[4594] ComCo, a company that assembles Print Engines from SoPECs, Memjetprintheads etc, customizing the Print Engine for a given OEM accordingto a license

[4595] OEM, a company that uses a Print Engine to create a printerproduct to sell to the end-users. The OEM would supply the motor controllogic, user interface, and casing.

[4596] The levels of authentication hierarchy are as follows:

[4597] QACo writes the boot ROM, agenerates dataset1, consisting of aboot loader program that loads and validates dataset2 and QACo'sasymmetric public boot1key. QACo signs dataset0 with the asymmetricprivate boot0key.

[4598] SoPECCo generates dataset1, consisting of the print enginesecurity kernel O/S (which incorporates the security-based features ofthe print engine functionality) and the ComCo's asymmetric public key.Upon a special “formal release” request from SoPECCo, QACo signsdataset0 with QACo's asymmetric private boot0key key. The print engineprogram code expects to see an operating parameter block signed by theComCo's asymmetric private key. Note that this is a special “formalrelease” request to by SoPECCo; the procedure for development versionsof the program are described in Section 3.6.3.

[4599] The ComCo generates dataSet3, consisting of dataset1 plusdataset2, where dataset2 is an operating parameter block for a givenOEM's print engine licence (according to the print engine licensearrangement) signed with the ComCo's asymmetric private key. Theoperating parameter block (dataset2) would contain valid print speedranges, a PrintEngineLicenseID, and the OEM's asymmetric public key. TheComCo can generate as many of these operating parameter blocks for anynumber of Print Engine Licenses, but cannot write or sign any supervisorO/S program code.

[4600] The OEM would generate dataset5, consisting of dataset3 plusdataset4, where dataset4 is the OEM program code signed with the OEM'sasymmetric private key. The OEM can produce as many versions of dataset5as it likes (e.g. for testing purposes or for updates to drivers etc)and need not involve Manufacturer/owner, QACo, or ComCo in any way.

[4601] The relationship is shown below in FIG. 325.

[4602] When the end-user uses dataset5, SoPEC itself validates dataset1via the boot0key mechanism described in Section 3.6.1. Once dataset1 isexecuting, it validates dataset2, and uses dataset2 data to validatedataset4. The validation hierarchy is shown in FIG. 326.

[4603] If a key is compromised, it compromises all subsequentauthorizations down the hierarchy. In the example from above (and asillustrated in FIG. 326) if the OEM's asymmetric private key iscompromised, then O/S program code is not compromised since it is aboveOEM program code in the authentication hierarchy. However if the ComCo'sasymmetric private key is compromised, then the OEM program code is alsocompromised. A compromise of boot0key compromises everything up to SoPECitself, and would require a mask ROM change in SoPEC to fix.

[4604] It is worthwhile repeating that in any hierarchy the security ofthe entire hierarchy is based on keeping the asymmetric private keypaired to boot0key secure. It is also a requirement that the programthat signs (i.e. authorizes) datasets using the asymmetric private keypaired to boot0key secure.

[4605] 3.6.3 Developing Program Code at Manufacturer/Owner

[4606] The hierarchical boot procedure described in Section 3.6.1 andSection 3.6.2 gives a hierarchy of protection in a final shippedproduct.

[4607] It is also desirable to use a hierarchy of protection duringsoftware development within Manufacturer/owner.

[4608] For a program to be downloaded and run on SoPEC duringdevelopment, it will need to be signed. In addition, we don't want tohave to sign each and every Manufacturer/owner development code with theboot0key, as it creates the possibility of any developmental (includingbuggy or rogue) application being run on any SoPEC.

[4609] Therefore QACo needs to generate/create a special intermediateboot loader, signed with boot0key, that performs the exact same tasks asthe normal boot loader, except that it checks the SoPECid to see if itis a specific SoPECid (or set of SoPECids). If the SoPEC_id is in thevalid set, then the developmental boot loader validates dataset2 bymeans of its length and a SHA-1 digest of the developmental code³, andnot by a further digital signature. The QACo can give this boot loaderto the software development team within Manufacturer/owner. The softwareteam can now write and run any program code, and load the program codeusing the development boot loader. There is no requirement for thesubsequent software program (i.e. the developmental program code) to besigned with any key since the programs can only be run on the particularSoPECs.

[4610] If the developmental boot loader (and/or signature generator)were compromised, or any of the developmental programs were compromised,the worst situation is that an attacker could run programs on thatparticular set of SoPECs, and on no others.

[4611] This should greatly reduce the possibility of erroneous programssigned with boot0key being available to an attacker (only officialreleases are signed by boot0key), and therefore reduces the possibilityof a Manufacturer/owner employee intentionally or inadvertently creatinga back door for attackers.

[4612] The relationship is shown below in FIG. 327.

[4613] Theoretically the same kind of hierarchy could also be used toallow OEMs to be assured that their program code will only work onspecific SoPECs, but this is unlikely to be necessary, and is probablyundesirable.

[4614] 3.6.4 Date-Limited Loaders

[4615] It is possible that errors in supervisor program code (e.g. theoperating system) could allow attackers to subvert the program in SoPECand gain supervisor control.

[4616] To reduce the impact of this kind of attack, it is possible toallocate some bits of the SoPEC_id to form some kind of date. Thegranularity of the date could be as simple as a single bit that says thedate is obtained from the regular IBM ECID, or it could be 6 bits thatgive 10 years worth of 3-month units.

[4617] The first step of the program loaded by boot loader 0 could checkthe SoPEC_id date, and run or refuse to run appropriately. TheManufacturer/owner driver or OS could therefore be limited to run onSoPECs that are manufactured up until a particular date.

[4618] This means that the OEM would require a new version of the OS forSoPECs after a particular date, but the new driver could be made to workon all previous versions of SoPEC.

[4619] The function simply requires a form of date, whose granularityfor working can be determined by agreement with the OEM.

[4620] For example, suppose that SoPECs are supplied with 3-monthgranularity in their date components. Manufacturer/owner could ship aversion of the OS that works for any SoPEC of the date (i.e. on anychIP), or for all SoPECs manufactured during the year etc. The driverissued the next year could work with all SoPECs up until that years etc.In this way the drivers for a chip will be backwards compatible, butwill be deliberately not forwards-compatible. It allows the downloadingof a new driver with no problems, but it protects against bugs in oneyears's driver OS from being used against future SoPECs.

[4621] Note that the phasing in of a new OS doesn't have to be at thesame time as the hardware. For example, the new OS can come in 3 monthsbefore the hardware that it supports. However once the new SoPECs arebeing delivered, the OEM must not ship the older driver with the newerSoPECs, for the old driver will not work on the newer SoPECs. Basicallyonce the OEM has received the new driver, they should use that driverfor all SoPEC systems from that point on (old SoPECs will work with thenew driver).

[4622] This date-limiting feature would most likely be using a field inthe ComCo specified operating parameters, so it allows the SoPEC to usedate-checking in addition to additional QA Chip related parameterchecking (such as the OEM's PrintEngineLicenseId etc).

[4623] A variant on this theme is a date-window, where a start-date andend-date are specified (as relating to SoPEC manufacture, not date ofuse).

[4624] 3.6.5 Authenticating Operating Parameters

[4625] Operating parameters need to be considered in terms ofManufacturer/owner operating parameters and OEM operating parameters.Both sets of operating parameters are stored on the PRINTER_QA chip(physically located inside the printer). This allows the printer tomaintain parameters regardless of being moved to different computers, ora loss/replacement of host O/S drivers etc.

[4626] On PRINTER_QA, memory vector M₀ contains the upgradable operatingparameters, and memory vectors M₁₊ contains any constant(non-upgradable) operating parameters.

[4627] Considering only Manufacturer/owner operating parameters for themoment, there are actually two problems:

[4628] a. setting and storing the Manufacturer/owner operatingparameters, which should be authorized only by Manufacturer/owner

[4629] b. reading the parameters into SoPEC, which is an issue of SoPECauthenticating the data on the PRINTER_QA chip since we don't trustPRINTER_QA.

[4630] The PRINTER_QA chip therefore contains the following symmetrickeys:

[4631] K₀=PrintEngineLicense_key. This key is constant for all SoPECssupplied for a given print engine license agreement between an OEM and aManufacturer/owner ComCo. K₀ has write permissions to theManufacturer/owner upgradeable region of M₀ on PRINTER_QA.

[4632] K₁=SoPEC_id_key. This key is unique for each SoPEC (see Section3.1), and is known only to the SoPEC and PRINTER_QA. K, does not havewrite permissions for anything.

[4633] K₀ is used to solve problem (a). It is only used to authenticatethe actual upgrades of the operating parameters. Upgrades are performedusing the standard upgrade protocol described in [5], with PRINTER_QAacting as the ChipU, and the external upgrader acting as the ChipS.

[4634] K₁ is used by SoPEC to solve problem (b). It is used toauthenticate reads of data (i.e. the operating parameters) fromPRINTER_QA. The procedure follows the standard authenticated readprotocol described in [5], with PRINTER_QA acting as ChipR, and theembedded supervisor software on SoPEC acting as ChipT. The authenticatedread protocol [5] requires the use of a 160-bit nonce, which is apseudo-random number. This creates the problem of introducingpseudo-randomness into SoPEC that is not readily determinable by OEMprograms, especially given that SoPEC boots into a known state. Onepossibility is to use the same random number generator as in the QA Chip(a 160-bit maximal-lengthed linear feedback shift register) with theseed taken from the value in the WatchDogTimer register in SoPEC's timerunit when the first page arrives.

[4635] Note that the procedure for verifying reads of data fromPRINTER_QA does not rely on Manufacturer/owner's key K₀. This means thatprecisely the same mechanism can be used to read and authenticate theOEM data also stored in PRINTER_QA. Of course this must be done byManufacturer/owner supervisor code so that SoPEC_id key is not revealed.

[4636] If the OEM also requires upgradable parameters, we can add anextra key to PRINTER_QA, where that key is an OEM_key and has writepermissions to the OEM part of M₀.

[4637] In this way, K₁ never needs to be known by anyone except theSoPEC and PRINTER_QA.

[4638] Each printing SoPEC in a multi-SoPEC system need access to aPRINTER_QA chip that contains the appropriate SoPEC_id_key to validateink useage and operating parameters. This can be accomplished by aseparate PRINTER_QA for each SoPEC, or by adding extra keys (multipleSoPEC_id_keys) to a single PRINTER_QA.

[4639] However, if ink usage is not being validated (e.g. if print speedwere the only Manufacturer/owner upgradable parameter) then not allSoPECs require access to a PRINTER_QA chip that contains the appropriateSoPEC_id_key. Assuming that OEM program code controls the physical motorspeed (different motors per OEM), then the PHI within the first (oronly) front-page SoPEC can be programmed to accept (or generate) linesync pulses no faster than a particular rate. If line syncs arrivedfaster than the particular rate, the PHI would simply print at theslower rate. If the motor speed was hacked to be fast, the print imagewill appear stretched.

[4640] 3.6.5.1 Floating Operating Parameters and Dongles

[4641] As described in Section 2.1.2, Manufacturer/owner operatingparameters include such items as print speed, print quality etc. and aretied to a license provided to an OEM. These parameters are underManufacturer/owner control. The licensed Manufacturer/owner operatingparameters are typically stored in the PRINTER_QA as described inSection 3.6.5.

[4642] However there are situations when it is desirable to have afloating upgrade to a license, for use on a printer of the user'schoice. For example, OEMs may sell a speed-increase license upgrade thatcan be plugged into the printer of the user's choice. This form ofupgrade can be considered a floating upgrade in that it upgradeswhichever printer it is currently plugged into. This dongle is referredto as ADDITIONAL_PRINTER_QA. The software checks for the existence of anADDITIONAL_PRINTER_QA, and if present the operating parameters arechosen from the values stored on both QA chips.

[4643] The basic problem of authenticating the additional operatingparameters boils down to the problem that we don't trustADDITIONAL_PRINTER_QA. Therefore we need a system whereby a given SoPECcan perform an authenticated read of the data in ADDITIONAL_PRINTER_QA.

[4644] We should not write the SoPEC_id_key to a key in theADDITIONAL_PRINTER_QA because:

[4645] then it will be tied specifically to that SoPEC, and the primaryintention of the ADDITIONAL_PRINTER_QA is that it be floatable;

[4646] the ink cartridge would then not work in another printer sincethe other printer would not know the old SoPEC_id_key (knowledge of theold key is required in order to change the old key to a new one).

[4647] updating keys is not power-safe (i.e. if at the user's site,power is removed mid-update, the ADDITIONAL_PRINTER_QA could be rendereduseless)

[4648] The proposed solution is to let ADDITIONAL_PRINTER_QA have twokeys:

[4649] K₀=FloatingPrintEngineLicense_key. This key has the same functionas the PrintEngineLicense_key in the PRINTER_QA⁴ in that K₀ has writepermissions to the Manufacturer/owner upgradeable region of M₀ onADDITIONAL_PRINTER_QA.

[4650] K₁=UseExtParmsLicense_key. This key is constant for all of theADDITIONAL_PRINTER_QAs for a given license agreement between an OEM anda Manufacturer/owner ComCo (this is not the same key asPrintEngineLicense_key which is stored as K₀ in PRINTER_QA). K₁ has nowrite permissions to anything.

[4651] K₀ is used to allow writes to the various fields containingoperating parameters in the ADDITIONAL_PRINTER_QA. These writes/upgradesare performed using the standard upgrade protocol described in [5], withADDITIONAL_PRINTER_QA acting as the ChipU, and the external upgraderacting as the ChipS. The upgrader (ChipS) also needs to check theappropriate licensing parameters such as OEM_Id for validity.

[4652] K₁ is used to allow SoPEC to authenticate reads of the inkremaining and any other ink data. This is accomplished by having thesame UseExtParmsLicense_key within PRINTER_QA (e.g. in K₂), also with nowrite permissions. i.e:

[4653] PRINTER_QA.K₂=UseExtParmsLicense_key. This key is constant forall of the PRINTER_QAs for a given license agreement between an OEM anda Manufacturer/owner ComCo. K₂ has no write permissions to anything.

[4654] This means there are two shared keys, with PRINTER_QA sharingboth, and thereby acting as a bridge between INK_CA and SoPEC.

[4655] UseExtParmsLicense_key is shared between PRINTER_QA andADDITIONAL_PRINTER_QA

[4656] SoPEC_id_key is shared between SoPEC and PRINTER_QA

[4657] All SoPEC has to do is do an authenticated read [6] fromADDITIONAL_PRINTER_QA, pass the data/signature to PRINTER_QA, letPRINTER_QA validate the data/signature, and get PRINTER_QA to produce asimilar signature based on the shared SoPEC_id_key. It can do so usingthe Translate function [6]. SoPEC can then compare PRINTER_QA'ssignature with its own calculated signature (i.e. implement a Testfunction [6] in software on SoPEC), and if the signatures match, thedata from ADDITIONAL_PRINTER_QA must be valID, and can therefore betrusted.

[4658] Once the data from ADDITIONAL_PRINTER_QA is known to be trusted,the various operating parameters such as OEM_Id can be checked forvalidity.

[4659] The actual steps of read authentication as performed by SoPECare: R_(PRINTER)

PRINTER_QA.random( ) R_(DONGLE),M_(DONGLE),SIG_(DONGLE)

DONGLE_QA.read(K1, R_(PRINTER)) R_(SOPEC)

random( ) R_(PRINTER), SIG_(PRINTER)

PRINTER_QA.translate(K2, R_(DONGLE), M_(DONGLE), SIG_(DONGLE), K1,R_(SOPEC)) SIG_(SOPEC)

HMAC_SHA_1(SoPEC_id_key, M_(DONGLE) | R_(PRINTER) | R_(SOPEC)) If(SIG_(PRINTER) = SIG_(SOPEC))// various parms inside M_(DONGLE) (data read fromADDITIONAL_PRINTER_QA) is valid Else // the data read fromADDITIONAL_PRINTER_QA is not valid and cannot be trusted EndIf

[4660] 3.6.5.2 Dongles Tied to a Given SoPEC

[4661] Section 3.6.5.1 describes floating dongles i.e. dongles that canbe used on any SoPEC. Sometimes it is desirable to tie a dongle to aspecific SoPEC.

[4662] Tying a QA_CHIP to be used only on a specific SoPEC can be easilyaccomplished by writing the PRINTER_QA's chipId (unique serial number)into an appropriate M₀ field on the ADDITIONAL_PRINTER_QA. The systemsoftware can detect the match and function appropriately. If there is nomatch, the software can ignore the data read from theADDITIONAL_PRINTER_QA.

[4663] Although it is also possible to store the SoPEC_id_key in one ofthe keys within the dongle, this must be done in an environment wherepower will not be removed partway through the key update process (ifpower is removed during the key update there is a possibility that thedongle QA Chip may be rendered unusable, although this can be checkedfor after the power failure).

[4664] 3.6.5.3 OEM Assembly-Line Test

[4665] Although an OEM should only be able sell the licensed operatingparameters for a given Print Engine, they must be able to assembly-linetest⁵ or service/test the Print Engine with a different set of operatingparameters e.g. a maximally upgraded Print Engine.

[4666] Several different mechanisms can be employed to allow OEMs totest the upgraded capabilities of the Print Engine. At present it isunclear exactly what kind of assembly-line tests would be performed.

[4667] The simplest solution is to use an ADDITIONAL_PRINTER_QA (i.e.special dongle PRINTER_QA as described in Section 3.6.5.1). TheADDITIONAL_PRINTER_QA would contain the operating parameters thatmaximally upgrade the printer as long as the dongle is connected to theSoPEC. The exact connection may be directly electrical (e.g. via thestandard QA Chip connections) or may be over the USB connection to theprinter test host depending on the nature of the test. The exactpreferred connection is yet to be determined.

[4668] In the testing environment, the ADDITIONAL_PRINTER_QA alsorequires a numberOfImpressions field inside M₀, which is writeable byK₀. Before the SoPEC prints a page at the higher speed, it decrementsthe numberOfImpressions counter, performs an authenticated read toensure the count was decremented, and then prints the page. In this way,the total number of pages that can be printed at high speed is reducedin the event of someone stealing the ADDITIONAL_PRINTER_QA device. Italso means that multiple test machines can make use of the sameADDITIONAL_PRINTER_QA.

[4669] 3.6.6 Use of a PrintEngineLicense id

[4670] Manufacturer/owner O/S program code contains the OEM's asymmetricpublic key to ensure that the subsequent OEM program code isauthentic—i.e. from the OEM. However given that SoPEC only contains asingle root key, it is theoretically possible for different OEM'sapplications to be run identically physical Print Engines i.e. printerdriver for OEM, run on an identically physical Print Engine from OEM₂.

[4671] To guard against this, the Manufacturer/owner O/S program codecontains a PrintEngineLicense_id code (e.g. 16 bits) that matches thesame named value stored as a fixed operating parameter in the PRINTER_QA(i.e. in M₁₊). As with all other operating parameters, the value ofPrintEngineLicense_id is stored in PRINTER_QA (and anyADDITIONAL_PRINTER_QA devices) at the same time as the other variousPRINTER_QA customizations are being applied, before being shipped to theOEM site.

[4672] In this way, the OEMs can be sure of differentiating themselvesthrough software functionality.

[4673] 3.6.7 Authentication of Ink

[4674] The Manufacturer/owner O/S must perform ink authentication [6]during prints. Ink usage authentication makes use of counters in SoPECthat keep an accurate record of the exact number of dots printed foreach ink.

[4675] The ink amount remaining in a given cartridge is stored in thatcartridge's INK QA chIP. Other data stored on the INK_QA chip includesink color, viscosity, Memjet firing pulse profile information, as wellas licensing parameters such as OEM_ID, inkType, InkUsageLicense_ID,etc. This information is typically constant, and is therefore likely tobe stored in M₁₊ within INK_QA.

[4676] Just as the Print Engine operating parameters are validated bymeans of PRINTER_QA, a given Print Engine license may only be permittedto function with specifically licensed ink. Therefore the software onSoPEC could contain a valid set of ink types, colors, OEM_Ids,InkUsageLicense_Ids etc. for subsequent matching against the data in theINK_QA.

[4677] SoPEC must be able to authenticate reads from the INK_QA, both interms of ink parameters as well as ink remaining.

[4678] To authenticate ink a number of steps must be taken:

[4679] restrict access to dot counts

[4680] authenticate ink usage and ink parameters via INK_QA andPRINTER_QA

[4681] broadcast ink dot usage to all SoPECs in a multi-SoPEC system

[4682] 3.6.7.1 Restrict Access to Dot Counts

[4683] Since the dot counts are accessed via the PHI in the PEP sectionof SoPEC, access to these registers (and more generally all PEPregisters) must be only available from supervisor mode, and not by OEMcode (running in user mode). Otherwise it might be possible for OEMprogram code to clear dot counts before authentication has occurred.

[4684] 3.6.7.2 Authenticate Ink Usage and Ink Parameters via INK_QA andPRINTER_QA

[4685] The basic problem of authentication of ink remaining and otherink data boils down to the problem that we don't trust INK_QA. Thereforehow can a SoPEC know the initial value of ink (or the ink parameters),and how can a SoPEC know that after a write to the INK_QA, the count hasbeen correctly decremented.

[4686] Taking the first issue, which is determining the initial inkcount or the ink parameters, we need a system whereby a given SoPEC canperform an authenticated read of the data in INK_QA.

[4687] We cannot write the SoPEC_id_key to the INK_QA for two reasons:

[4688] updating keys is not power-safe (i.e. if power is removedmid-update, the INK_QA could be rendered useless)

[4689] the ink cartridge would then not work in another printer sincethe other printer would not know the old SoPEC_id_key (knowledge of theold key is required in order to change the old key to a new one).

[4690] The proposed solution is to let INK_QA have two keys:

[4691] K₀=SupplyInkLicense_key. This key is constant for all inkcartridges for a given ink supply agreement between an OEM and aManufacturer/owner ComCo (this is not the same key asPrintEngineLicense_key which is stored as K₀ in PRINTER_QA). K₀ haswrite permissions to the ink remaining regions of M₀ on INK_QA.

[4692] K₁=UseInkLicense_key. This key is constant for all ink cartridgesfor a given ink usage agreement between an OEM and a Manufacturer/ownerComCo (this is not the same key as PrintEngineLicense_key which isstored as K₀ in PRINTER_QA). K₁ has no write permissions to anything.

[4693] K₀ is used to authenticate the actual upgrades of the amount ofink remaining (e.g. to fill and refill the amount of ink). Upgrades areperformed using the standard upgrade protocol described in [5], withINK_QA acting as the ChipU, and the external upgrader acting as theChipS. The fill and refill upgrader (ChipS) also needs to check theappropriate ink licensing parameters such as OEM_ID, InkType andInkUsageLicense_Id for validity.

[4694] K₁ is used to allow SoPEC to authenticate reads of the inkremaining and any other ink data. This is accomplished by having thesame UseInkLicense_key within PRINTER_QA (e.g. in K₂ or K₃), also withno write permissions.

[4695] This means there are two shared keys, with PRINTER_QA sharingboth, and thereby acting as a bridge between INK_QA and SoPEC.

[4696] UseInkLicense_key is shared between INK_QA and PRINTER_QA

[4697] SoPEC_id_key is shared between SoPEC and PRINTER_QA

[4698] All SoPEC has to do is do an authenticated read [6] from INK_QA,pass the data/signature to PRINTER_QA, let PRINTER_QA validate thedata/signature and get PRINTER_QA to produce a similar signature basedon the shared SoPEC id key (i.e. the Translate function [6]). SoPEC canthen compare PRINTER_QA's signature with its own calculated signature(i.e. implement a Test function [6] in software on the SoPEC), and ifthe signatures match, the data from INK_QA must be valID, and cantherefore be trusted.

[4699] Once the data from INK_QA is known to be trusted, the amount ofink remaining can be checked, and the other ink licensing parameterssuch as OEM_ID, InkType, InkUsageLicense_Id can be checked for validity.

[4700] The actual steps of read authentication as performed by SoPECare: R_(PRINTER)

PRINTER_QA.random( ) R_(INK), M_(INK), SIG_(INK)

INK_QA.read(K1, R_(PRINTER)) // read with key1: UseInkLicense_keyR_(SOPEC)

random( ) R_(PRINTER), SIG_(PRINTER)

PRINTER_QA.translate (K2, R_(INK), M_(INK), SIG_(INK), K1, R_(SOPEC))SIG_(SOPEC)

HMAC_SHA_1(SoPEC_id_key, M_(INK) | R_(PRINTER) | R_(SOPEC)) If(SIG_(PRINTER)= SIG_(SOPEC)) // M_(INK) (data read from INK_QA) is valid// M_(INK) could be ink parameters, such as InkUsageLicense_Id, or inkremaining If (M_(INK).inkRemaining = expectedInkRemaining) // all is okElse // the ink value is not what we wrote, so don't print anythinganymore EndIf Else // the data read from INK_QA is not valid and cannotbe trusted EndIf

[4701] Strictly speaking, we don't need a nonce (R_(SOPEC)) all the timebecause M_(A) (containing the ink remaining) should be decrementingbetween authentications. However we do need one to retrieve the initialamount of ink and the other ink parameters (at power up). This is whytaking a random number from the WatchDogTimer at the receipt of thefirst page is acceptable.

[4702] In summary, the SoPEC performs the non-authenticated write [6] ofink remaining to the INK_QA chip, and then performs an authenticatedread of the data via the PRINTER_QA as per the pseudocode above. If thevalue is authenticated, and the INK_QA ink-remaining value matches theexpected value, the count was correctly decremented and the printing cancontinue.

[4703] 3.6.7.3 Broadcast Ink Dot Usage to all SoPECs in a Multi-SoPECSystem

[4704] In a multi-SoPEC system, each SoPEC attached to a printhead mustbroadcast its ink usage to all the SoPECs. In this way, each SoPEC willhave its own version of the expected ink usage.

[4705] In the case of a man-in-the-middle attack, at worst the count ina given SoPEC is only its own count (i.e. all broadcasts are turned into0 ink usage by the man-in-the-middle). We would also require thebroadcast amount to be treated as an unsigned integer to preventnegative amounts from being substituted.

[4706] A single SoPEC performs the update of ink remaining to the INK_QAchip, and then all SoPECs perform an authenticated read of the data viathe appropriate PRINTER_QA (the PRINTER_QA that contains their matchingSoPEC_id_key—remember that multiple SoPEC id keys can be stored in asingle PRINTER_QA). If the value is authenticated, and the INK_QA valuematches the expected value, the count was correctly decremented and theprinting can continue.

[4707] If any of the broadcasts are not received, or have been tamperedwith, the updated ink counts will not match. The only case this does notcater for is if each SoPEC is tricked (via a USB2 inter-SoPEC-commsman-in-the-middle attack) into a total that is the same, yet not thetrue total. Apart from the fact that this is not viable for generalpages, at worst this is the maximum amount of ink printed by a singleSoPEC. We don't care about protecting against this case.

[4708] Since a typical maximum is 4 printing SoPECs, it requires at most4 authenticated reads. This should be completed within 0.5 seconds, wellwithin the 1-2 seconds/page print time.

[4709] 3.6.8 Example Hierarchy

[4710] Adding an extra bootloader step to the example from Section3.6.2, we can break up the contents of program space into logicalsections, as shown in Table 227. Note that the ComCo does not provideany program code, merely operating parameters that is used by the O/S.TABLE 227 Sections of Program Space section contents verifies 0 bootloader 0 section 1 via boot0key (ROM) SHA-1 function asymmetric decryptfunction boot0key 1 boot loader 1 section 2 via SoPEC_OS_public_keySoPEC_OS_public_key 2 Manufacturer/owner section 3 via O/S program codeComCo_public_key function to generate section 4 via SoPEC_id_key fromOEM_public_key (supplied SoPEC_id Basic Print in section 3) EnginePRINTER_QA data, which ComCo_public_key includes thePrintEngineLicense_id, Manufacturer/owner operating parameters, and OEMoperating parameters (all authenticated via SoPEC_id_key) 3 ComColicense Is used by section 2 agreement operat- to verify section 4 ingparameter ranges, and range of including parameters as found inPrintEngineLicense_id PRINTER_QA (gets loaded into supervisor mode sec-tion of memory) OEM_public_key (gets loaded into supervisor mode sectionof memory) Any ComCo written user-mode program code (gets loaded intomode mode section of memory) 4 OEM specific program OEM operating codeparameters via calls to Manufacturer/owner O/S code

[4711] The verification procedures will be required each time the CPU iswoken up, since the RAM is not preserved.

[4712] 3.6.9 What if the CPU is Not Fast Enough?

[4713] In the example of Section 3.6.8, every time the CPU is woken upto print a document it needs to perform:

[4714] SHA-1 on all program code and program data

[4715] 4 sets of asymmetric decryption to load the program code and data

[4716] 1 HMAC-SHA1 generation per 512-bits of Manufacturer/owner and OEMprinter and ink operating parameters

[4717] Although the SHA-1 and HMAC process will be fast enough on theembedded CPU (the program code will be executing from ROM), it may bethat the asymmetric decryption will be slow. And this becomes morelikely with each extra level of authentication. If this is the case (asis likely), hardware acceleration is required.

[4718] A cheap form of hardware acceleration takes advantage of the factthat in most cases the same program is loaded each time, with the firsttime likely to be at power-up. The hardware acceleration is simply datastorage for the authorizedDigest which means that the boot procedure nowis: slowCPU_bootloader0(data, sig) localDigest

SHA-1(data) If (localDigest = previouslyStoredAuthorizedDigest) jump toprogram code at data-start address// will never return ElseauthorizedDigest

decrypt(sig, boot0key) expectedDigest = 0x00|0x01| 0xFF..0xFF|0x003021300906052B0E03021A05000414 | localDigest) If (authorizedDigest== expectedDigest) previouslyStoredAuthorizedDigest

localDigest jump to program code at data-start address// will neverreturn Else // program code is unauthorized EndIf

[4719] This procedure means that a reboot of the same authorized programcode will only require SHA-1 processing. At power-up, or if new programcode is loaded (e.g. an upgrade of a driver over the internet), then thefull authorization via asymmetric decryption takes place. This isbecause the stored digest will not match at power-up and whenever a newprogram is loaded.

[4720] The question is how much preserved space is required.

[4721] Each digest requires 160 bits (20 bytes), and this is constantregardless of the asymmetric encryption scheme or the key length. Whileit is possible to reduce this number of bits, thereby sacrificingsecurity, the cost is small enough to warrant keeping the full digest.

[4722] However each level of boot loader requires its own digest to bepreserved. This gives a maximum of 20 bytes per loader. Digests foroperating parameters and ink levels may also be preserved in the sameway, although these authentications should be fast enough not to requirecached storage.

[4723] Assuming SoPEC provides for 12 digests (to be generous), this isa total of 240 bytes. These 240 bytes could easily be stored as60×32-bit registers, or probably more conveniently as a small amount ofRAM (eg 0.25-1 Kbyte). Providing something like 1 Kbyte of RAM has theadvantage of allowing the CPU to store other useful data, although thisis not a requirement.

[4724] In general, it is useful for the boot ROM to know whether it isbeing started up due to power-on reset, GPIO activity, or activity onthe USB2. In the former case, it can ignore the previously stored values(either 0 for registers or garbage for RAM). In the latter cases, it canuse the previously stored values. Even without this, a startup value of0 (or garbage) means the digest won't match and therefore theauthentication will occur implictly.

[4725] 3.7 SoPEC Phsyical Identification

[4726] There must be a mapping of logical to physical since specificSoPECs are responsible for printing on particular physical parts of thepage, and/or have particular devices attached to specific pins.

[4727] The identification process is mostly solved by general USB2enumeration.

[4728] Each slave SoPEC will need to verify the boot broadcast messagesreceived over USB2, and only execute the code if the signatures arevalid. Several levels of authorization may occur. However, at somestage, this common program code (broadcast to all of the slave SoPECsand signed by the appropriate asymmetric private key) can, among otherthings, set the slave SoPEC's id relating to the physical location. Ifthere is only 1 slave, the id is easy to determine, but if there is morethan 1 slave, the id must be determined in some fashion. For example,physical location/id determination may be:

[4729] given by the physical USB2 port on the master

[4730] related to the physical wiring up of the USB2 interconnects

[4731] based on GPIO wiring. On other systems, a particular physicalarrangement of SoPECs may exist such that each slave SoPEC will have adifferent set of connections on GPIOs. For example, one SoPEC maybe incharge of motor control, while another may be driving the LEDs etc. Theunused GPIO pins (not necessarily the same on each SoPEC) can be set asinputs and then tied to 0 or 1. As long as the connection settings aremutually exclusive, program code can determine which is which, and theid appropriately set.

[4732] This scheme of slave SoPEC identification does not introduce asecurity breach. If an attacker rewires the pinouts to confuseidentification, at best it will simply cause strange printouts (e.g.swapping of printout data) to occur, while at worst the Print Enginewill simply not function.

[4733] 3.8 Setting Up QA Chip Keys

[4734] In use, each INK_QA chip needs the following keys:

[4735] K₀=SupplyInkLicense_key

[4736] K₁=UseInkLicense_key

[4737] Each PRINTER_QA chip tied to a specific SoPEC requires thefollowing keys:

[4738] K₀=PrintEngineLicense_key

[4739] K₁=SoPEC_id_key

[4740] K₂=UseExtParmsLicense_key

[4741] K₃=UseInkLicense_key

[4742] Note that there may be more than one K₁ depending on the numberof PRINTER_QA chips and SoPECs in a system. These keys need to beappropriately set up in the QA Chips before they will function correctlytogether.

[4743] 3.8.1 Original QA Chips as Received by a ComCo

[4744] When original QA Chips are shipped from QACo to a specific ComCotheir keys are as follows:

[4745] K₀=QACo_ComCo_Key0

[4746] K₁=QACo_ComCo_Key1

[4747] K₂=QACo_ComCo_Key2

[4748] K₃=QACo_ComCo_Key3

[4749] All 4 keys are only known to QACo. Note that these keys aredifferent for each QA ChIP.

[4750] 3.8.2 Steps at the ComCo

[4751] The ComCo is responsible for making Print Engines out of Memjetprintheads, QA Chips, PECs or SoPECs, PCBs etc.

[4752] In addition, the ComCo must customize the INK_QA chips andPRINTER_QA chip on-board the print engine before shipping to the OEM.

[4753] There are two stages:

[4754] replacing the keys in QA Chips with specific keys for theapplication (i.e. INK_QA and PRINTER_QA)

[4755] setting operating parameters as per the license with the OEM

[4756] 3.8.2.1 Replacing Keys

[4757] The ComCo is issued QID hardware [4] by QACo that allowsprogramming of the various keys (except for K₁) in a given QA Chip tothe final values, following the standard ChipF/ChipP replace key(indirect version) protocol [6]. The indirect version of the protocolallows each QACo_ComCo_Key to be different for each SoPEC.

[4758] In the case of programming of PRINTER_QA's K₁ to be SoPEC_id_key,there is the additional step of transferring an asymmetrically encryptedSoPEC_id_key (by the public-key) along with the nonce (R_(P)) used inthe replace key protocol to the device that is functioning as a ChipF.The ChipF must decrypt the SoPEC_id_key so it can generate the standardreplace key message for PRINTER_QA (functioning as a ChipP in theChipF/ChipP protocol). The asymmetric key pair held in the ChipFequivalent should be unique to a ComCo (but still known only by QACo) toprevent damage in the case of a compromise.

[4759] Note that the various keys installed in the QA Chips (both INK_QAand PRINTER_QA) are only known to the QACo. The OEM only uses QIDs andQACo supplied ChipFs. The replace key protocol [6] allows theprogramming to occur without compromising the old or new key.

[4760] 3.8.2.2 Setting Operating Parameters

[4761] There are two sets of operating parameters stored in PRINTER_QAand INK_QA:

[4762] fixed

[4763] upgradable

[4764] The fixed operating parameters can be written to by means of anon-authenticated writes [6] to M₁₊ via a QID [4], and permission bitsset such that they are ReadOnly.

[4765] The upgradable operating parameters can only be written to afterthe QA Chips have been programmed with the correct keys as per Section3.8.2.1. Once they contain the correct keys they can be programmed withappropriate operating parameters by means of a QID and an appropriateChipS (containing matching keys).

[4766] Authentication Protocols

[4767] 1 Introduction

[4768] The following describes authentication protocols for generalauthentication applications, but with specific reference to the QA ChIP.

[4769] The intention is to show the broad form of possible protocols foruse in different authentication situations, and can be used as areference when subsequently defining an implementation specification fora particular application. As mentioned earlier, although the protocolsare described in relation to a printing environment, many of them havewider application such as, but not limited to, those described at theend of this specification.

[4770] 2 Nomenclature

[4771] The following symbolic nomenclature is used throughout thisdocument: TABLE 228 Summary of symbolic nomenclature Symbol DescriptionF[X] Function F, taking a single parameter X F[X, Y] Function F, takingtwo parameters, X and Y X | Y X concatenated with Y X

Y Bitwise X AND Y X

Y Bitwise X OR Y (inclusive-OR) X ⊕ Y Bitwise X XOR Y (exclusive-OR)

X Bitwise NOT X (complement) X

Y X is assigned the value Y X

{Y, Z} The domain of assignment inputs to X is Y and Z X = Y X is equalto Y X ≠ Y X is not equal to Y

X Decrement X by 1 (floor 0)

X Increment X by 1 (modulo register length) Erase X Erase Flash memoryregister X SetBits[X, Y] Set the bits of the Flash memory register Xbased on Y Z

ShiftRight[X, Y] Shift register X right one bit position, taking inputbit from Y and placing the output bit in Z

[4772] 3 Pseudocode

[4773] 3.1 Asynchronous

[4774] The following pseudocode:

[4775] var=expression

[4776] means the var signal or output is equal to the evaluation of theexpression.

[4777] 3.2 Synchronous

[4778] The following pseudocode:

[4779] var←expression

[4780] means the var register is assigned the result of evaluating theexpression during this cycle.

[4781] 3.3 Expression

[4782] Expressions are defined using the nomenclature in Table 228above. Therefore:

[4783] var=(a=b)

[4784] is interpreted as the var signal is 1 if a is equal to b, and 0otherwise.

[4785] 4. Intentionally Blank

[4786] 5 Basic Protocols

[4787] 5.1 Protocol Background

[4788] This protocol set is a restricted form of a more general case ofa multiple key single memory vector protocol. It is a restricted form inthat the memory vector M has been optimized for Flash memoryutilization:

[4789] M is broken into multiple memory vectors (semi-fixed and variablecomponents) for the purposes of optimizing flash memory utilization.Typically M contains some parts that are fixed at some stage of themanufacturing process (eg a batch number, serial number etc.), and onceset, are not ever updated. This information does not contain the amountof consumable remaining, and therefore is not read or written to withany great frequency.

[4790] We therefore define M₀ to be the M that contains the frequentlyupdated sections, and the remaining Ms to be rarely written to.Authenticated writes only write to M₀, and non-authenticated writes canbe directed to a specific M_(n). This reduces the size of permissionsthat are stored in the QA Chip (since key-based writes are not requiredfor Ms other than M₀). It also means that M₀ and the remaining Ms can bemanipulated in different ways, thereby increasing flash memorylongevity.

[4791] 5.2 Requirements of Protocol

[4792] Each QA Chip contains the following values:

[4793] N The maximum number of keys known to the chIP.

[4794] T The number of vectors M is broken into.

[4795] K_(N) Array of N secret keys used for calculating F_(Kn)[X] whereK_(n) is the nth element of the array.

[4796] R Current random number used to ensure time varying messages.Each chip instance must be seeded with a different initial value.Changes for each signature generation.

[4797] M_(T) Array of T memory vectors. Only M₀ can be written to withan authorized write, while all Ms can be written to in an unauthorizedwrite. Writes to M₀ are optimized for Flash usage, while updates to anyother M₁₊ are expensive with regards to Flash utilization, and areexpected to be only performed once per section of M_(n). M₁ contains T,N and f in ReadOnly form so users of the chip can know these two values.

[4798] P_(T+N) T+N element array of access permissions for each part ofM. Entries n={0 . . . T−1} hold access permissions for non-authenticatedwrites to M_(n) (no key required). Entries n={T to T+N−1}hold accesspermissions for authenticated writes to M₀ for K_(n). Permission choicesfor each part of M are Read Only, Read/Write, and Decrement Only.

[4799] C 3 constants used for generating signatures. C₁, C₂, and C₃ areconstants that pad out a sub-message to a hashing boundary, and all 3must be different.

[4800] Each QA Chip contains the following private function:

[4801] S_(Kn)[N,X] Internal function only. Returns S_(Kn)[X], the resultof applying a digital signature function S to X based upon theappropriate key K_(n). The digital signature must be long enough tocounter the chances of someone generating a random signature. The lengthdepends on the signature scheme chosen, although the scheme chosen forthe QA Chip is HMAC-SHA1, and therefore the length of the signature is160 bits.

[4802] Additional functions are required in certain QA Chips, but theseare described as required.

[4803] 5.3 Read protocols

[4804] The set of read protocols describe the means by which a Systemreads a specific data vector M_(t) from a QA Chip referred to as ChipR.

[4805] We assume that the communications link to ChipR (and thereforeChipR itself) is not trusted. If it were trusted, the System couldsimply read the data and there is no issue. Since the communicationslink to ChipR is not trusted and ChipR cannot be trusted, the Systemneeds a way of authenticating the data as actually being from a realChipR.

[4806] Since the read protocol must be capable of being implemented inphysical QA Chips, we cannot use asymmetric cryptography (for examplethe ChipR signs the data with a private key, and System validates thesignature using a public key).

[4807] This document describes two read protocols:

[4808] direct validation of reads

[4809] indirect validation of reads.

[4810] 5.3.1 Direct Validation of Reads

[4811] In a direct validation read protocol we require two QA Chips:ChipR is the QA Chip being read, and ChipT is the QA Chip we entrust totell us whether or not the data read from ChipR is trustworthy. Thebasic idea is that system asks ChipR for data, and ChipR responds withthe data and a signature based on a secret key. System then asks ChipTwhether the signature supplied by ChipR is correct. If ChipT respondsthat it is, then System can trust that data just read from ChipR. Everytime data is read from ChipR, the validation procedure must be carriedout.

[4812] Direct validation requires the System to trust the communicationline to ChipT. This could be because ChipT is in physical proximity tothe System, and both System and ChipT are in a trusted (e.g. Silverbrooksecure) environment. However, since we need to validate the read, ChipRby definition must be in a non-trusted environment.

[4813] Each QA Chip protects its signature generation or verificationmechanism by the use of a nonce.

[4814] The protocol requires the following publicly available functionsin ChipT:

[4815] Random

Returns R (does not advance R).

[4816] Test[n,X, Y, Z] Advances R and returns 1 if S_(Kn)[R|X|C₁|Y]=Z.Otherwise returns 0. The time taken to calculate and compare signaturesmust be independent of data content.

[4817] The protocol requires the following publicly available functionsin ChipR:

[4818] Read[n, t, X] Advances R, and returns R, M_(t),S_(Kn)[X|R|C₁|M_(t)]. The time taken to calculate the signature must notbe based on the contents of X, R, M_(t), or K. If t is invalID, thefunction assumes t=0.

[4819] To read ChipR's memory M_(t) in a validated way, System performsthe following tasks:

[4820] a. System calls ChipT's Random function;

[4821] b. ChipT returns R_(T) to System;

[4822] c. System calls ChipR's Read function, passing in some key numbern1, the desired data vector number t, and R_(T) (from b);

[4823] d. ChipR updates R_(R), then calculates and returns R_(R),M_(Rt), S_(Kn1)[R_(T)|R_(R)|C₁|M_(Rt)];

[4824] e. System calls ChipT's Test function, passing in the key to usefor signature verification n2, and the results from d (i.e. R_(R),M_(Rt), S_(Kn1)[R_(T)|R_(R)|C₁|M_(Rt)]);

[4825] f. System checks response from ChipT. If the response is 1, thenthe M_(t) read from ChipR is considered to be valid. If 0, then theM_(t) read from ChipR is considered to be invalid.

[4826] The choice of n1 and n2 must be such that ChipR's K_(n1)=ChipT'sK_(n2).

[4827] The data flow for this read protocol is shown in FIG. 328.

[4828] From the System's perspective, the protocol would take on a formlike the following pseudocode: R_(T)

ChipT.Random( ) R_(R), M_(R), SIG_(R)

ChipR.Read(keyNumOnChipR,desiredM, R_(T)) ok

ChipT.Test(keyNumOnChipT, R_(R), M_(R), SIG_(R)) If (ok = 1) // M_(R) isto be trusted Else // M_(R) is not to be trusted EndIf

[4829] With regards to security, if an attacker finds out ChipR'sK_(n1), they can replace the ChipR by a fake ChipR because they cancreate signatures. Likewise, if an attacker finds out ChipT's K_(n2),they can replace the ChipR by a fake ChipR because ChipR'sK_(n1)=ChipT's K_(n2). Moreover, they can use the ChipRs on any systemthat shares the same key.

[4830] The only way of restricting exposure due to key reveals is torestrict the number of systems that match ChipR and ChipT. i.e. vary thekey as much as possible. The degree to which this can be done willdepend on the application. In the case of a PRINTER_QA acting as aChipT, and an INK_QA acting as a ChipR, the same key must be used on allsystems where the particular INK_QA data must be validated.

[4831] In all cases, ChipR must contain sufficient information toproduce a signature. Knowing (or finding out) this information, whateverform it is in, allows clone ChipRs to be built.

[4832] 5.3.2 Indirect Validation of Reads

[4833] In a direct validation protocol (see Section 5.3.1), the Systemvalidates the correctness of data read from ChipR by means of a trustedchip ChipT. This is possible because ChipR and ChipT share some secretinformation.

[4834] However, it is possible to extend trust via indirect validation.This is required when we trust ChipT, but ChipT doesn't know how tovalidate data from ChipR. Instead, ChipT knows how to validate data fromChipI (some intermediate chIP) which in turn knows how to validate datafrom either another ChipI (and so on up a chain) or ChipR. Thus we havea chain of validation.

[4835] The means of validation chains is translation of signatures.ChipI_(n) translates signatures from higher up the chain (eitherChipI_(n−1) or from ChipR at the start of the chain) into signaturescapable of being passed to the next stage in the chain (eitherChipI_(n+1) or to ChipT at the end of the chain). A given ChipI can onlytranslate signatures if it knows the key of the previous stage in thechain as well as the key of the next stage in the chain.

[4836] The protocol requires the following publicly available functionsin ChipI:

[4837] Random

Returns R (does not advance R).

[4838] Translate[n1,X, Y, Z,n2,A] Returns 1, S_(Kn2)[A|R|C₁|Y] andadvances R if Z=S_(Kn1)[R|X|C₁|Y]. Otherwise returns 0, 0. The timetaken to calculate and compare signatures must be independent of datacontent.

[4839] The data flow for this signature translation protocol is shown inFIG. 329:

[4840] Note that R_(prev) is eventually R_(R), and R_(next) iseventually R_(T). In the multiple ChipI case, R_(prev) is the R_(I) ofChipI_(n−1) and R_(next) is R_(I) of ChipI_(n+1). The R_(prev) of thefirst ChipI in the chain is R_(R), and the R_(next) of the last ChipI inthe chain is R_(T).

[4841] Assuming at least 1 ChipT, the System would need to perform thefollowing tasks in order to read ChipR's memory M_(t) in an indirectlyvalidated way:

[4842] a. System calls ChipI_(n)'s Random function;

[4843] b. ChipI₀ returns R_(I0) to System;

[4844] c. System calls ChipR's Read function, passing in some key numbern0, the desired data vector number t, and R_(I0) (from b);

[4845] d. ChipR updates R_(R), then calculates and returns R_(R),M_(Rt), S_(Kn0)[R_(In)|R_(R)|C₁|M_(Rt)];

[4846] e. System assigns R_(R) to R_(prev) andS_(Kn0)[R_(In)|R_(R)|C₁|M_(Rt)] to SIG_(prev)

[4847] f. System calls the next-chip-in-the-chain's Random function(either ChipI_(n+), or ChipT)

[4848] g. The next-chip-in-the-chain will return R_(next) to System

[4849] h. System calls ChipI_(n)'s Translate function, passing in n1_(n)(translation input key number), R_(prev), M_(Rt), SIG_(prev)), n2_(n),(translation output key number) and the results from g (R_(next));

[4850] i. ChipI returns testResult and SIG₁ to System

[4851] j. If testResult=0, then the validation has failed, and the M_(t)read from ChipR is considered to be invalid. Exit with failure.

[4852] k. If the next chip in the chain is a ChipI, assign SIG₁ toSIG_(prev) and go to step f

[4853] l. System calls ChipT's Test function, passing in n_(t),R_(prev), M_(Rt), and SIG_(prev);

[4854] m. System calls System checks response from ChipT. If theresponse is 1, then the M_(t) read from ChipR is considered to be valid.If 0, then the M_(t) read from ChipR is considered to be invalid.

[4855] For the Translate function to work, ChipI_(n) and ChipI_(n+1),must share a key. The choice of n1 and n2 in the protocol described mustbe such that ChipI_(n)'s K_(n2)=ChipI_(n+1)'s K_(n1).

[4856] Note that Translate is essentially a “Test plus resign” function.From an implementation point of view the first part of Translate isidentical to Test.

[4857] Note that the use of ChipIs and the translate function merelyallows signatures to be transformed. At the end of the translation chain(if present) will be a ChipT requiring the use of a Test function.

[4858] There can be any number of ChipIs in the chain to ChipT as longas the Translate function is used to map signatures between ChipI_(n)and ChipI_(n+1) and so on until arrival at the final destination(ChipT).

[4859] From the System's perspective, a read protocol using at least 1ChipI would take on a form like the following pseudocode: R_(next)

ChipI[0].Random( ) R_(prev), M_(R), SIG_(prev)

ChipR.Read(keyNumOnChipR, desiredM, R_(next)) ok = 1 i = 0 while ((i <iMax) AND ok) For i

0 to iMax If (i = iMax) R_(next)

ChipT.Random( ) Else R_(next)

ChipI[i+1].Random( ) EndIf ok, SIG_(prev)

ChipI[i].Translate(iKey[i], R_(prev), M_(R), SIG_(prev), oKey[i],R_(next)) R_(prev) = R_(next) If (ok = 0) // M_(R) is not to be trustedEndIf EndFor ok

ChipT.Test(keyNumOnChipT, R_(prev), M_(R), SIG_(prev)) If (ok = 1) //M_(R) is to be trusted Else // M_(R) is not to be trusted EndIf

[4860] 5.3.3 Additional Comments on Reads

[4861] In the Memjet printing environment, certain implementations willexist where the operating parameters are stored in QA Chips. In thiscase, the system must read the data from the QA Chip using anappropriate read protocol.

[4862] If the connection is trusted (e.g. to a virtual QA Chip insoftware), a generic Read is sufficient. If the connection is nottrusted, it is ideal that the System have a trusted ChipT in the form ofsoftware (if possible) or hardware (e.g. a QA Chip on board the samesilicon package as the microcontroller and firmware). Whetherimplemented in software or hardware, the QA Chip should contain anappropriate key that is unique per print engine. Such a key setup wouldallow reads of print engine parameters and also allow indirect reads ofconsumables (from a consumable QA ChIP).

[4863] If the ChipT is physically separate from System (e.g. ChipT is ona board connected to System) System must also occasionally (based onsystem clock for example) call ChipT's Test function with bad data,expecting a 0 response. This is to reduce the possibility of someoneinserting a fake ChipT into the system that always returns 1 for theTest function.

[4864] 5.4 Upgrade Protocols

[4865] This set of protocols describe the means by which a Systemupgrades a specific data vector M_(t) within a QA Chip (ChipU). The datavector may contain information about the functioning of the device (e.g.the current maximum operating speed) or the amount of a consumableremaining.

[4866] The updating of M_(t) in ChipU falls into two categories:

[4867] non-authenticated writes, where anyone is able to update the datavector

[4868] authenticated writes, where only authorized entities are able toupgrades data vectors

[4869] 5.4.1 Non-Authenticated Writes

[4870] This is the most frequent type of write, and takes place betweenthe System/consumable during normal everyday operation for M₀, andduring the manufacturing process for M₁₊.

[4871] In this kind of write, the System wants to change M_(t) withinChipU subject to P. For example, the System could be decrementing theamount of consumable remaining. Although System does not need to knowand of the K_(S) or even have access to a trusted chip to perform thewrite, the System must follow a non-authenticated write by anauthenticated read if it needs to know that the write was successful.

[4872] The protocol requires ChipU to contain the following publiclyavailable function:

[4873] Write[t, X] Writes X over those parts of M_(t) subject to P_(t)and the existing value for M.

[4874] To authenticate a write of M_(new) to ChipA's memory M_(new):

[4875] a. System calls ChipU's Write function, passing in M_(new);

[4876] b. The authentication procedure for a Read is carried out (seeSection 5.3 on page 604);

[4877] c. If the read succeeds in such a way that M_(new)=M returned inb, the write succeeded. If not, it failed.

[4878] Note that if these parameters are transmitted over an error-pronecommunications line (as opposed to internally or using an additionalerror-free transport layer), then an additional checksum would berequired to prevent the wrong M from being updated or to prevent thecorrect M from being updated to the wrong value. For example, SHA-1[t,X] should be additionally transferred across the communications lineand checked (either by a wrapper function around Write or in a variantof Write that takes a hash as an extra parameter).

[4879] This is the most frequent type of write, and takes place betweenthe System/consumable during normal everyday operation for M₀, andduring the manufacturing process for M₁₊.

[4880] 5.4.2 Authenticated Writes

[4881] In the QA Chip protocols, M₀ is defined to be the only datavector that can be upgraded in an authenticated way. This decision wasmade primarily to simplify flash management, although it also helps toreduce the permissions storage requirements.

[4882] In this kind of write, System wants to change Chip U's M₀ in anauthorized way, without being subject to the permissions that applyduring normal operation. For example, a consumable may be at a refillingstation and the normally Decrement Only section of M₀ should be updatedto include the new valid consumable. In this case, the chip whose M₀ isbeing updated must authenticate the writes being generated by theexternal System and in addition, apply the appropriate permission forthe key to ensure that only the correct parts of M₀ are updated. Havinga different permission for each key is required as when multiple keysare involved, all keys should not necessarily be given open access toM₀. For example, suppose M₀ contains printer speed and a counter ofmoney available for franking. A ChipS that updates printer speed shouldnot be capable of updating the amount of money. Since P_(0 . . . T−1) isused for non-authenticated writes, each K_(n) has a correspondingpermission P_(T+n) that determines what can be updated in anauthenticated write.

[4883] The basic principle of the authenticated write (or upgrade)protocol is that the new value for the M_(t) must be signed before ChipUaccepts it. The QA Chip responsible for generating the signature (ChipS)must first validate that the ChipU is valid by reading the old value forM_(t). Once the old value is seen as valID, a new value can be signed byChipS and the resultant data plus signature passed to ChipU. Note thatboth chips distrust each other.

[4884] There are two forms of authenticated writes. The first form iswhen both ChipU and ChipS directly store the same key. The second iswhen both ChipU and ChipS store different versions of the key and atransforming procedure is used on the stored key to generate therequired key—i.e. the key is indirectly stored. The second form isslightly more complicated, and only has value when the ChipS is notreadily available to an attacker.

[4885] 5.4.2.1 Direct Authenticated Writes

[4886] The direct form of the authenticated write protocol is used whenthe ChipS and ChipU are equally available to an attacker. For example,suppose that ChipU contains a printer's operating speed. Suppose thatthe speed can be increased by purchasing a ChipS and inserting it intothe printer system. In this case, the ChipS and ChipU are equallyavailable to an attacker. This is different from upgrading the printerover the internet where the effective ChipS is in a remote location, andthereby not as readily available to an attacker.

[4887] The direct authenticated write protocol requires ChipU to containthe following publicly available functions:

[4888] Read[n, t, X] Advances R, and returns R, M_(t),S_(Kn)[X|R|C₁|M_(t)]. The time taken to calculate the signature must notbe based on the contents of X, R, M_(t), or K.

[4889] WriteA[n, X, Y, Z] Advances R, replaces M₀ by Y subject toP_(T+n), and returns 1 only if S_(Kn)[R|X|C₁|Y]=Z. Otherwise returns 0.The time taken to calculate and compare signatures must be independentof data content. This function is identical to ChipT's Test functionexcept that it additionally writes Y subject to P_(T+n) to its M whenthe signature matches.

[4890] Authenticated writes require that the System has access to aChipS that is capable of generating appropriate signatures.

[4891] In its basic form, ChipS requires the following variables andfunction:

[4892] SignM[n,V,W,X,Y,Z] Advances R, and returns R, S_(Kn)[W|R|C₁|Z]only if Y=S_(Kn)[V|W|C₁|X]. Otherwise returns all 0s. The time taken tocalculate and compare signatures must be independent of data content.

[4893] To update ChipU's M vector:

[4894] a. System calls ChipU's Read function, passing in n1, 0 (desiredvector number) and 0 (the random value, but is a don't-care value) asthe input parameters;

[4895] b. ChipU produces R_(U), M_(U0), S_(Kn1)[0|R_(U)|C₁|M_(U0)] andreturns these to System;

[4896] c. System calls ChipS's SignM function, passing in n2 (the key tobe used in ChipS), 0 (the random value as used in a), R_(U), M_(U0),S_(Kn1)[0|R_(U)|C₁|M_(U0)], and M_(D) (the desired vector to be writtento ChipU);

[4897] d. ChipS produces R_(S) and S_(Kn2)[R_(U)|R_(S)|C₁|M_(D)] if theinputs were valID, and 0 for all outputs if the inputs were not valid.

[4898] e. If values returned in d are non zero, then ChipU is consideredauthentic. System can then call ChipU's WriteA function with thesevalues from d.

[4899] f. ChipU should return a 1 to indicate success. A 0 should onlybe returned if the data generated by ChipS is incorrect (e.g. atransmission error).

[4900] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[4901] The data flow for authenticated writes is shown in FIG. 330.

[4902] Note that this protocol allows ChipS to generate a signature forany desired memory vector M_(D), and therefore a stolen ChipS has theability to effectively render the particular keys for those parts of M₀in ChipU irrelevant.

[4903] It is therefore not recommended that the basic form of ChipS beever implemented except in specifically controlled circumstances.

[4904] It is much more secure to limit the powers of ChipS. Thefollowing list covers some of the variants of limiting the power ofChipS:

[4905] a. the ability to upgrade a limited number of times

[4906] b. the ability to upgrade based on a credit value—i.e. theupgrade amount is decremented from the local value, and effectivelytransferred to the upgraded device

[4907] c. the ability to upgrade to a fixed value or from a limited list

[4908] d. the ability to upgrade to any value

[4909] e. the ability to only upgrade certain data fields within M

[4910] In many of these variants, the ability to refresh the ChipS insome way (e.g. with a new count or credit value) would be a usefulfeature.

[4911] In certain cases, the variant is in ChipS, while ChipU remainsthe same. It may also be desirable to create a ChipU variant, forexample only allowing ChipU to only be upgraded a specific number oftimes.

[4912] 5.4.2.1.1 Variant Example

[4913] This section details the variant for the ability to upgrade amemory vector to any value a specific number of times, but the upgradeis only allowed to affect certain fields within the memory vector i.e. acombination of (a), (d), and (e) above.

[4914] In this example, ChipS requires the following variables andfunction:

[4915] CountRemaining Part of ChipS's M₀ that contains the number ofsignatures that ChipS is allowed to generate. Decrements with eachsuccessful call to SignM and SignP. Permissions in ChipS's P_(0..T−1)for this part of M₀ needs to be ReadOnly once ChipS has been setup.Therefore CountRemaining can only be updated by another ChipS that willperform updates to that part of M₀ (assuming ChipS's Ps allows that partof M₀ to be updated).

[4916] Q Part of M that contains the write permissions for updatingChipU's M. By adding Q to ChipS we allow different ChipSs that canupdate different parts of M_(U). Permissions in ChipS's P_(0..T−1) forthis part of M needs to be ReadOnly once ChipS has been setup. ThereforeQ can only be updated by another ChipS that will perform updates to thatpart of M.

[4917] SignM[n,V,W,X,Y,Z] Advances R, decrements CountRemaining andreturns R, Z_(QX) (Z applied to X with permissions Q),S_(Kn)[W|R|C₁|Z_(QX)] only if Y=S_(Kn)[V|W|C₁|X] and CountRemaining>0.Otherwise returns all 0s. The time taken to calculate and comparesignatures must be independent of data content.

[4918] To update ChipU's M vector:

[4919] a. System calls ChipU's Read function, passing in n1, 0 (desiredvector number) and 0 (the random value, but is a don't-care value) asthe input parameters;

[4920] b. ChipU produces R_(U), M_(U0), S_(Kn1)[0|R_(U)|C₁|M_(U0)] andreturns these to System;

[4921] c. System calls ChipS's SignM function, passing in n2 (the key tobe used in ChipS), 0 (as used in a), R_(U), M_(U0),S_(Kn1)[0|R_(U)|C₁|M_(U0)], and M_(D) (the desired vector to be writtento ChipU);

[4922] d. ChipS produces R_(S), M_(QD) (processed by running M_(D)against M_(U0) using Q) and S_(Kn2)[R_(U)|R_(S)|C₁|M_(QD)] if the inputswere valID, and 0 for all outputs if the inputs were not valid.

[4923] e. If values returned in d are non zero, then ChipU is consideredauthentic. System can then call ChipU's WriteA function with thesevalues from d.

[4924] f. ChipU should return a 1 to indicate success. A 0 should onlybe returned if the data generated by ChipS is incorrect (e.g. atransmission error).

[4925] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[4926] The data flow for this variant of authenticated writes is shownin FIG. 331.

[4927] Note that Q in ChipS is part of ChipS's M. This allows a user toset up ChipS with a permission set for upgrades. This should be done toChipS and that part of M designated by P_(0..T−1) set to ReadOnly beforeChipS is programmed with K_(U). If K_(S) is programmed with K_(U) first,there is a risk of someone obtaining a half-setup ChipS and changing allof M_(U) instead of only the sections specified by Q.

[4928] In addition, CountRemaining in ChipS needs to be setup (includingmaking it ReadOnly in P_(S)) before ChipS is programmed with K_(U).ChipS should therefore be programmed to only perform a limited number ofSignM operations (thereby limiting compromise exposure if a ChipS isstolen). Thus ChipS would itself need to be upgraded with a newCountRemaining every so often.

[4929] 5.4.2.2 Indirect Authenticated Writes

[4930] This section describes an alternative authenticated writeprotocol when ChipU is more readily available to an attacker and ChipSis less available to an attacker. We can store different keys on ChipUand ChipS, and implement a mapping between them in such a way that ifthe attacker is able to obtain a key from a given ChipU, they cannotupgrade all ChipUs.

[4931] In the general case, this is accomplished by storing key K_(S) onChipS, and K_(U) and f on ChipU. The relationship is f(K_(S))=K_(U) suchthat knowledge of K_(U) and f does not make it easy to determine K_(S).This implies that a one-way function is desirable for f.

[4932] In the QA Chip domain, we define f as a number (e.g. 32-bits)such that SHA1(K_(S)|f)=K_(U). The value of f (random between chips) canbe stored in a known location within M₁ as a constant for the life ofthe QA ChIP. It is possible to use the same f for multiple relationshipsif desired, since f is public and the protection lies in the fact that fvaries between QA Chips (preferably in a non-predictable way).

[4933] The indirect protocol is the same as the direct protocol with theexception that f is additionally passed in to the SignM function so thatChipS is able to generate the correct key. The System obtains f byperforming a Read of M₁. Note that all other functions, including theWriteA function in ChipU, are identical to their direct authenticationcounterparts.

[4934] SignM[f,n,V,W,X,Y,Z] Advances R, and returns R,S_(f(Kn))[W|R|C₁|Z] only if Y=S_(f(Kn))[V|W|C₁|X] and CountRemaining>0.Otherwise returns all 0s. The time taken to calculate and comparesignatures must be independent of data content.

[4935] Before reading ChipU's memory M₀ (the pre-upgrade value), theSystem must extract f from ChipU by performing the following tasks:

[4936] a. System calls ChipU's Read function, passing in (dontCare, 1,dontCare)

[4937] b. ChipU returns M₁, from which System can extract f_(U)

[4938] c. System stores f_(U) for future use

[4939] To update ChipU's M vector, the protocol is identical to thatdescribed in the basic authenticated write protocol with the exceptionof steps c and d:

[4940] c. System calls ChipS's SignM function, passing in f_(U), n2 (thekey to be used in ChipS), 0 (as used in a), R_(U), M_(U0),S_(Kn1)[0|R_(U)|C₁|M_(U0)], and M_(D) (the desired vector to be writtento ChipU);

[4941] d. ChipS produces R_(S) and S_(fU(Kn2))[R_(U)|R_(S)|C₁|M_(D)] ifthe inputs were valID, and 0 for all outputs if the inputs were notvalid.

[4942] In addition, the choice of n1 and n2 must be such that ChipU'sK_(n1)=ChipS's f_(U)(K_(n2)).

[4943] Note that f_(U) is obtained from M₁ without validation. This isbecause there is nothing to be gained by subverting the value of f_(U),(because then the signatures won't match).

[4944] From the System's perspective, the protocol would take on a formlike the following pseudocode: dontCare, M_(R), dontCare

ChipR.Read(dontCare,1, dontCare) f_(R) = extract from M_(R) ... R_(U),M_(U), SIG_(U)

ChipU.Read(keyNumOnChipU,0, 0) R_(S), SIG_(S) = ChipS.SignM2(f_(R),keyNumOnChipS, 0, R_(U), M_(U), SIG_(U), M_(D)) If (R_(S) = SIG_(S) = 0)// ChipU and therefore M_(U) is not to be trusted Else // ChipU andtherefore M_(U) can be trusted ok = ChipU.WriteA(keyNumOnChipU, R_(S),M_(D), SIG_(S)) If (ok) // updating of data in ChipU was successful Else// transmission error during WriteA EndIf EndIf

[4945] 5.4.2.2.1 Variant Example

[4946] The indirect form of the example from Section 5.4.2.1.1 is shownhere.

[4947] SignM[f,n,V,W,X,Y,Z] Advances R, decrements CountRemaining andreturns R, Z_(QX) (Z applied to X with permissions Q),S_(f(Kn))[W|R|C₁|Z_(QX)] only if Y=S_(f(Kn))[V|W|C₁X] andCountRemaining>0. Otherwise returns all 0s. The time taken to calculateand compare signatures must be independent of data content.

[4948] Before reading ChipU's memory M₀ (the pre-upgrade value), theSystem must extract f from ChipU by performing the following tasks:

[4949] a. System calls ChipU's Read function, passing in (dontCare, 1,dontCare)

[4950] b. ChipU returns M₁, from which System can extract f_(U)

[4951] c. System stores f_(U) for future use

[4952] To update ChipU's M vector, the protocol is identical to thatdescribed in the basic authenticated write protocol with the exceptionof steps c and d:

[4953] c. System calls ChipS's SignM function, passing in f_(U), n2 (thekey to be used in ChipS), 0 (as used in a), R_(U), M_(U0),S_(Kn1)[0|R_(U)|C₁|M_(U0)], and M_(D) (the desired vector to be writtento ChipU);

[4954] d. ChipS produces R_(S), M_(QD) (processed by running M_(D)against M_(U0) using Q) and S_(fU(Kn2))[R_(U)|R_(S)|C₁|M_(QD)] if theinputs were valID, and 0 for all outputs if the inputs were not valid.

[4955] In addition, the choice of n1 and n2 must be such that ChipU'sK_(n1)=ChipS's f_(U)(K_(n2)).

[4956] Note that f_(U) is obtained from M₁ without validation. This isbecause there is nothing to be gained by subverting the value of f_(U),(because then the signatures won't match).

[4957] From the System's perspective, the protocol would take on a formlike the following pseudocode: dontCare, M_(R), dontCare

ChipR.Read(dontCare,1, dontCare) f_(R) = extract from M_(R) ... R_(U),M_(U), SIG_(U)

ChipU.Read(keyNumOnChipU,0, 0) R_(S), M_(QD), SIG_(S) =ChipS.SignM2(f_(R), keyNumOnChipS, 0, R_(U), M_(U), SIG_(U), M_(D)) If(R_(S) = M_(QD) = SIG_(S) = 0) // ChipU and therefore M_(U) is not to betrusted Else // ChipU and therefore M_(U) can be trusted ok =ChipU.WriteA(keyNumOnChipU, R_(S), M_(QD), SIG_(S)) If (ok) // updatingof data in ChipU was successful Else // transmission error during WriteAEndIf EndIf

[4958] 5.4.3 Updating Permissions for Future Writes

[4959] In order to reduce exposure to accidental and malicious attackson P (and certain parts of M), only authorized users are allowed toupdate P. Writes to P are the same as authorized writes to M, exceptthat they update P_(n) instead of M. Initially (at manufacture), P isset to be Read/Write for all M. As different processes fill up differentparts of M, they can be sealed against future change by updating thepermissions. Updating a chip's P_(0..T−1) changes permissions forunauthorized writes to M_(n), and updating P_(T..T+N−1) changespermissions for authorized writes with key K_(n).

[4960] P_(n) is only allowed to change to be a more restrictive form ofitself. For example, initially all parts of M have permissions ofRead/Write. A permission of Read/Write can be updated to Decrement Onlyor Read Only. A permission of Decrement Only can be updated to becomeRead Only. A Read Only permission cannot be further restricted.

[4961] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[4962] The protocol requires the following publicly available functionsin ChipU:

[4963] Random

Returns R (does not advance R).

[4964] SetPermission[n,p,X,Y,Z] Advances R, and updates P_(p) accordingto Y and returns 1 followed by the resultant P_(p) only ifS_(Kn)[R|X|Y|C₂]=Z. Otherwise returns 0. P_(p) can only become morerestricted. Passing in 0 for any permission leaves it unchanged (passingin Y=0 returns the current P_(p)).

[4965] Authenticated writes of permissions require that the System hasaccess to a ChipS that is capable of generating appropriate signatures.ChipS requires the following variable:

[4966] CountRemaining Part of ChipS's M₀ that contains the number ofsignatures that ChipS is allowed to generate. Decrements with eachsuccessful call to SignM and SignP. Permissions in ChipS's P_(0..T−1)for this part of M₀ needs to be ReadOnly once ChipS has been setup.Therefore CountRemaining can only be updated by another ChipS that willperform updates to that part of M₀ (assuming ChipS's P_(n) allows thatpart of M₀ to be updated).

[4967] In addition, ChipS requires either of the following two SignPfunctions depending on whether direct or indirect key storage is used(see direct vs indirect authenticated write protocols in Section 5.4.2):

[4968] SignP[n,X,Y] Used when the same key is directly stored in bothChipS and ChipU. Advances R, decrements CountRemaining and returns R andS_(Kn)[X|R|Y|C₂] only if CountRemaining>0. Otherwise returns all 0s. Thetime taken to calculate and compare signatures must be independent ofdata content.

[4969] SignP[f,n,X,Y] Used when the same key is not directly stored inboth ChipS and ChipU. In this case ChipU's K_(n1)=ChipS's f(K_(n2)). Thefunction is identical to the direct form of SignP, except that itadditionally accepts f and returns S_(f(Kn))[X|R|Y|C₂] instead ofS_(Kn)[X|R|Y|C₂].

[4970] 5.4.3.1 Direct Form of SignP

[4971] When the direct form of SignP is used, ChipU's P_(n) is updatedas follows:

[4972] a. System calls ChipU's Random function;

[4973] b. ChipU returns R_(U) to System;

[4974] c. System calls ChipS's SignP function, passing in n2, R_(U) andP_(D) (the desired P to be written to ChipU);

[4975] d. ChipS produces R_(S) and S_(Kn2)[R_(U)|R_(S)|P_(D)|C₂] if itis still permitted to produce signatures.

[4976] e. If values returned in d are non zero, then System can thencall ChipU's SetPermission function with n1, the desired permissionentry p, R_(S), P_(D) and S_(Kn2)[R_(U)|R_(S)|P_(D)|C₂].

[4977] f. ChipU verifies the received signature against its owngenerated signature S_(Kn1)[R_(U)|R_(S)|P_(D)|C₂] and applies P_(D) toP_(n) if the signature matches

[4978] g. System checks 1 st output parameter. 1=success, 0=failure.

[4979] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[4980] The data flow for basic authenticated writes to permissions isshown in FIG. 332.

[4981] 5.4.3.2 Indirect Form of SignP

[4982] When the indirect form of SignP is used in ChipS, the System mustextract f from ChipU (so it knows how to generate the correct key) byperforming the following tasks:

[4983] a. System calls ChipU's Read function, passing in (dontCare, 1,dontCare)

[4984] b. ChipU returns M₁, from which System can extract f_(U)

[4985] c. System stores f_(U) for future use

[4986] ChipU's P_(n) is updated as follows:

[4987] a. System calls ChipU's Random function;

[4988] b. ChipU returns R_(U) to System;

[4989] c. System calls ChipS's SignP function, passing in f_(U), n2,R_(U) and P_(D) (the desired P to be written to ChipU);

[4990] d. ChipS produces R_(S) and S_(fU(K) _(n2))[R_(U)|R_(S)|P_(D)|C₂]if it is still permitted to produce signatures.

[4991] e. If values returned in d are non zero, then System can thencall ChipU's SetPermission function with n1, the desired permissionentry p, R_(S), P_(D) and S_(fU(Kn2))[R_(U)|R_(S)|P_(D)|C₂].

[4992] f. ChipU verifies the received signature againstS_(Kn1)[R_(U)|R_(S)|P_(D)|C₂] and applies P_(D) to P_(n) if thesignature matches

[4993] g. System checks 1 st output parameter. 1=success, 0=failure.

[4994] In addition, the choice of n1 and n2 must be such that ChipU'sK_(n1)=ChipS's f_(U)(K_(n2)).

[4995] 5.4.4 Protecting Memory Vectors

[4996] To protect the appropriate part of M_(n) against unauthorizedwrites, call SetPermissions[n] for n=0 to T−1. To protect theappropriate part of M₀ against authorized writes with key n, callSetPermissions[T+n] for n=0 to N−1.

[4997] Note that only M₀ can be written in an authenticated fashion.

[4998] Note that the SetPermission function must be called after thepart of M has been set to the desired value.

[4999] For example, if adding a serial number to an area of M₁ that iscurrently ReadWrite so that noone is permitted to update the numberagain:

[5000] the Write function is called to write the serial number to M₁

[5001] SetPermission(1) is called for to set that part of M to beReadOnly for non-authorized writes.

[5002] If adding a consumable value to M₀ such that only keys 1-2 canupdate it, and keys 0, and 3-N cannot:

[5003] the Write function is called to write the amount of consumable toM

[5004] SetPermission is called for 0 to set that part of M₀ to beDecrementOnly for non-authorized writes. This allows the amount ofconsumable to decrement.

[5005] SetPermission is called for n={T, T+3, T+4 . . . , T+N−1} to setthat part of M₀ to be ReadOnly for authorized writes using all but keys1 and 2. This leaves keys 1 and 2 with ReadWrite permissions to M₀.

[5006] It is possible for someone who knows a key to further restrictother keys, but it is not in anyone's interest to do so.

[5007] 5.5 Programming K

[5008] In this case, we have a factory chip (ChipF) connected to aSystem. The System wants to program the key in another chip (ChipP).System wants to avoid passing the new key to ChipP in the clear, andalso wants to avoid the possibility of the key-upgrade message beingreplayed on another ChipP (even if the user doesn't know the key).

[5009] The protocol assumes that ChipF and ChipP already share (directlyor indirectly) a secret key K_(old). This key is used to ensure thatonly a chip that knows K_(old) can set K_(new).

[5010] Although the example shows a ChipF that is only allowed toprogram a specific number of ChipPs, the key-upgrade protocol can beeasily altered (similar to the way the write protocols have variants) toprovide other means of limiting the ability to update ChipPs.

[5011] The protocol requires the following publicly available functionsin ChipP:

[5012] Random

Returns R (does not advance R).

[5013] ReplaceKey[n, X, Y, Z] Replaces K_(n) by S_(Kn)[R|X|C₃]⊕Y,advances R, and returns 1 only if S_(Kn)[X|Y|C₃]=Z. Otherwise returns 0.The time taken to calculate signatures and compare values must beidentical for all inputs.

[5014] And the following data and functions in ChipF:

[5015] CountRemaining Part of M₀ with contains the number of signaturesthat ChipF is allowed to generate. Decrements with each successful callto GetProgramKey. Permissions in P for this part of M₀ needs to beReadOnly once ChipF has been setup. Therefore can only be updated by aChipS that has authority to perform updates to that part of M₀.

[5016] K_(new) The new key to be transferred from ChipF to ChipP. Mustnot be visible. After manufacture, K_(new) is 0.

[5017] SetPartialKey[X] Updates K_(new) to be K_(new)⊕X. This functionallows K_(new) to be programmed in any number of steps, thereby allowingdifferent people or systems to know different parts of the key (but notthe whole K_(new)). K_(new) is stored in ChipF's flash memory.

[5018] In addition, ChipF requires either of the following GetProgramKeyfunctions depending on whether direct or indirect key storage is used onthe input key and/or output key (see direct vs indirect authenticatedwrite protocols in Section 5.4.2):

[5019] GetProgramKey1[n, X] Direct to direct. Used when the same key(K_(n)) is directly stored in both ChipF and ChipP and we want to storeK_(new) in ChipP. Advances R_(F), decrements CountRemaining, outputsR_(F), the encrypted key S_(Kn)[X|R_(F)|C₃]⊕K_(new) and a signature ofthe first two outputs plus C₃ if CountRemaining>0. Otherwise outputs 0.The time to calculate the encrypted key & signature must be identicalfor all inputs.

[5020] GetProgramKey2[f, n, X] Direct to indirect. Used when the samekey (K_(n)) is directly stored in both ChipF and ChipP but we want tostore f_(P)(K_(new)) in ChipP instead of simply K_(new) (i.e. we want tokeep the key in ChipP to be different in all ChipPs). In this caseChipP's K_(n1)=ChipF's f_(P)(K_(n2)). The function is identical toGetProgramKey1, except that it additionally accepts f_(P), and returnsS_(Kn)[X|R_(F)|C₃]⊕f_(P)(K_(new)) instead of S_(Kn)[X|R_(F)|C₃]⊕K_(new).Note that the produced signature is produced using K_(n) since that iswhat is already stored in ChipP.

[5021] GetProgramKey3[f, n, X] Indirect to direct. Used when the samekey is not directly stored in both ChipF and ChipP but we want to storeK_(new) in ChipP. In this case ChipP's K_(n1)=ChipF's f_(P)(K_(n2)). Thefunction is identical to GetProgramKey1, except that it additionallyaccepts f_(P), and returns S_(fP(Kn))[X|R_(F)|C₃]⊕K_(new) instead ofS_(Kn)[X|R_(F)|C₃]⊕K_(new). The produced signature is produced usingf_(P)(Kn) instead of K_(n) since that is what is already stored inChipP.

[5022] GetProgramKey4[f, n, X] Indirect to indirect. Used when the samekey is not directly stored in both ChipF and ChipP but we want to storef_(P)(K_(new)) in ChipP instead of simply K_(new) (i.e. we want to keepthe key in ChipP to be different in all ChipPs). In this case ChipP'sK_(n1)=ChipF's f_(P)(K_(n2)). The function is identical toGetProgramKey3, except that it returnsS_(fP(Kn))[X|R_(F)|C₃]⊕f_(P)(K_(new)) instead ofS_(fP(Kn))[X|R_(F)|C₃]⊕K_(new). The produced signature is produced usingf_(P)(K_(n)) since that is what is already stored in ChipP.

[5023] Since there are likely to be few ChipFs, and many ChipPs, theindirect forms of GetProgramKey can be usefully employed.

[5024] 5.5.1 GetProgramKey1—direct to direct

[5025] With the “old key=direct, new key=direct” form of GetProgramKey,to update P's key:

[5026] a. System calls ChipP's Random function;

[5027] b. ChipP returns R_(P) to System;

[5028] c. System calls ChipF's GetProgramKey function, passing in n2(the desired key to use) and the result from b;

[5029] d. ChipF updates R_(F), then calculates and returns R_(F),S_(Kn2)[R_(P)|R_(F)|C₃]⊕K_(new), andS_(Kn2)[R_(F)|S_(Kn2)[R_(P)|R_(F)|C₃|⊕K_(new)⊕C₃];

[5030] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in n1 (the key to use in ChipP) and theresponse from d;

[5031] f. System checks response from ChipP. If the response is 1, thenChipP's K_(n1) has been correctly updated to K_(new). If the response is0, ChipP's K_(n1) has not been updated.

[5032] The choice of n1 and n2 must be such that ChipP's K_(n1)=ChipF'sK_(n2).

[5033] The data flow for key updates is shown in FIG. 333:

[5034] Note that K_(new) is never passed in the open. An attacker couldsend its own R_(P), but cannot produce S_(Kn2)[R_(P)|R_(F)|C₃] withoutK_(n2). The signature based on K_(new) is sent to ensure that ChipP willbe able to determine if either of the first two parameters have beenchanged en route.

[5035] CountRemaining needs to be setup in M_(F0) (including making itReadOnly in P) before ChipF is programmed with K_(P). ChipF shouldtherefore be programmed to only perform a limited number ofGetProgramKey operations (thereby limiting compromise exposure if aChipF is stolen). An authorized ChipS can be used to update this counterif necessary (see Section 5.4.2 on page 610).

[5036] 5.5.2 GetProgramKey2—Direct to Indirect

[5037] With the “old key=direct, new key=indirect” form ofGetProgramKey, to update P's key, the System must extract f from ChipP(so it can tell ChipF how to generate the correct key) by performing thefollowing tasks:

[5038] a. System calls ChipP's Read function, passing in (dontCare, 1,dontCare)

[5039] b. ChipP returns M₁, from which System can extract f_(P)

[5040] c. System stores f_(P) for future use

[5041] ChipP's key is updated as follows:

[5042] a. System calls ChipP's Random function;

[5043] b. ChipP returns R_(P) to System;

[5044] c. System calls ChipF's GetProgramKey function, passing in f_(P),n2 (the desired key to use) and the result from b;

[5045] d. ChipF updates R_(F), then calculates and returns R_(F),S_(Kn2)[R_(P)|R_(F)|C₃]⊕f_(P)(K_(new)), andS_(Kn2)[R_(F)|S_(Kn2)[R_(P)|R_(F)|C₃]⊕f_(P)(K_(new))|C₃];

[5046] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in n1 (the key to use in ChipP) and theresponse from d;

[5047] f. System checks response from ChipP. If the response is 1, thenChipP's K_(n1) has been correctly updated to f_(P)(K_(new)). If theresponse is 0, ChipP's K_(n1) has not been updated.

[5048] The choice of n1 and n2 must be such that ChipP's K_(n1)=ChipF'sK_(n2).

[5049] 5.5.3 GetProgramKey3—Indirect to Direct

[5050] With the “old key=indirect, new key=direct” form ofGetProgramKey, to update P's key, the System must extract f from ChipP(so it can tell ChipF how to generate the correct key) by performing thefollowing tasks:

[5051] a. System calls ChipP's Read function, passing in (dontCare, 1,dontCare)

[5052] b. ChipP returns M₁, from which System can extract f_(P)

[5053] c. System stores f_(P) for future use

[5054] ChipP's key is updated as follows:

[5055] a. System calls ChipP's Random function;

[5056] b. ChipP returns R_(P) to System;

[5057] c. System calls ChipF's GetProgramKey function, passing in f_(P),n2 (the desired key to use) and the result from b;

[5058] d. ChipF updates R_(F), then calculates and returns R_(F),S_(fP(Kn2))[R_(P)|R_(F)|C₃]⊕K_(new), andS_(fP(Kn2))[R_(F)|S_(fP(Kn2))[R_(P)|R_(F)|C₃]⊕K_(new)|C₃];

[5059] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in n1 (the key to use in ChipP) and theresponse from d;

[5060] f. System checks response from ChipP. If the response is 1, thenChipP's K_(n1) has been correctly updated to K_(new). If the response is0, ChipP's K_(n1) has not been updated.

[5061] The choice of n1 and n2 must be such that ChipP's K_(n1)=ChipF'sf_(P)(K_(n2)).

[5062] 5.5.4 GetProgramKey4—Indirect to Indirect

[5063] With the “old key=indirect, new key=indirect” form ofGetProgramKey, to update P's key, the System must extract f from ChipP(so it can tell ChipF how to generate the correct key) by performing thefollowing tasks:

[5064] a. System calls ChipP's Read function, passing in (dontCare, 1,dontCare)

[5065] b. ChipP returns M₁, from which System can extract f_(P)

[5066] c. System stores f_(P) for future use

[5067] ChipP's key is updated as follows:

[5068] a. System calls ChipP's Random function;

[5069] b. ChipP returns R_(P) to System;

[5070] c. System calls ChipF's GetProgramKey function, passing in f_(P),n2 (the desired key to use) and the result from b;

[5071] d. ChipF updates R_(F), then calculates and returns R_(F),S_(fP(Kn2))[R_(P)|R_(F)|C₃]⊕f_(P)(K_(new)), andS_(fP(Kn2))[R_(F)|S_(fP(Kn2))[R_(P)|R_(F)|C₃]⊕f_(P)(K_(new))|C₃];

[5072] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in n1 (the key to use in ChipP) and theresponse from d;

[5073] f. System checks response from ChipP. If the response is 1, thenChipP's K_(n1) has been correctly updated to f_(P)(K_(new)). If theresponse is 0, ChipP's K_(n1) has not been updated.

[5074] The choice of n1 and n2 must be such that ChipP's K_(n1)=ChipF'sf_(P)(K_(n2)).

[5075] 5.5.5 Chicken and Egg

[5076] The Program Key protocol requires both ChipF and ChipP to knowK_(old) (either directly or indirectly). Obviously both chips had to beprogrammed in some way with K_(old), and thus K_(old) can be thought ofas an older K_(new):K_(old) can be placed in chips if another ChipFknows K_(older), and so on.

[5077] Although this process allows a chain of reprogramming of keys,with each stage secure, at some stage the very first key (K_(first))must be placed in the chips. K_(first) is in fact programmed with thechip's microcode at the manufacturing test station as the last step inmanufacturing test. K_(first) can be a manufacturing batch key, changedfor each batch or for each customer etc., and can have as short a lifeas desired. Compromising K_(first) need not result in a completecompromise of the chain of Ks. This is especially true if K_(first) isindirectly stored in ChipPs (i.e. each ChipP holds an f and f(K_(first))instead of K_(first) directly). One example is where K_(first) (the keystored in each chip after manufacture/test) is a batch key, and can bedifferent per chIP. K_(first) may advance to a ComCo specific K_(seond)etc. but still remain indirect. A direct form (e.g. K_(fianal)) onlyneeds to go in if it is actually required at the end of the programmingchain.

[5078] Depending on reprogramming requirements, K_(first) can be thesame or different for all K_(n).

[5079] 6 Memjet Forms of Protocols

[5080] Physical QA Chips are used in Memjet printer systems to storeprinter operating parameters as well as consumable parameters.

[5081] 6.1 PRINTER_QA

[5082] A PRINTER_QA is stored within each print engine to perform twoprimary tasks:

[5083] storage and protection of operating parameters

[5084] a means of indirect read validation of other QA Chip data vectors

[5085] Each PRINTER_QA contains the following keys: TABLE 229 Keys inPrinterQA Key Contents Comments 0 Upgrade Key Used to upgrade theoperating parameters. Should be indirect form of key (i.e. a differentkey for each PRINTER_QA) so that an indirect form of the write isrequired. 1 Consumable Read Used to indirectly read the data fromValidation Key an CONSUMABLE_QA chip using indirect authenticated readprotocol (Section 5.3.2 on page 606). 2 PrintEngineController Whenreading data from the Read Validation Key PRINTER_QA, the system caneither trust the data, or must use this key to perform the authenticatedread protocol (see Section 5.3 on page 604). 3-n (reserved) Currentlyunused. Could be used to provide a means to indirectly read additionalprint engine operating parameters ala K1, or provide additional PrintEngine validation ala K2.

[5086] Note that if multiple Print Engine Controllers are used (e.g. amultiple SoPEC system), then multiple PrintEngineController ReadValidation Keys are required. These keys can be stored within a singlePRINTER_QA (e.g. in K₃ and beyond), or can be stored in separatePRINTER_QAs (for example each SoPEC (or group of SoPECs) has anindividual PRINTER_QA).

[5087] The functions required in the PRINTER_QA are:

[5088] Random, ReplaceKey, to allow key programming & substitution

[5089] Read, to allow reads of data

[5090] Write, to allow updates of M₁₊ during manufacture

[5091] WriteAuth, to provide a means of updating the M₀ data (operatingparameters)

[5092] SetPermissions, to provide a means of updating write permissions

[5093] Test, to provide a means of checking if consumable reads arevalid

[5094] Translate, to provide a means of indirect reading of consumabledata

[5095] 6.2 CONSUMABLE_QA

[5096] A CONSUMABLE_QA is stored with each consumable (e.g. inkcartridge) to perform two primary tasks:

[5097] storage of consumable related data

[5098] protection of consumable amount remaining

[5099] Each CONSUMABLE_QA contains the following keys: TABLE 230 Keys inCONSUMABLE_QA Key Contents Comments 0 Upgrade Key Used to upgrade theconsumable parameters. Should be stored as the indirect form of the key(i.e. a different key for each CONSUMABLE_QA) so that an indirect formof the write is required. 1 Consumable Read When reading data from theValidation Key CONSUMABLE_QA, the system can either trust the data, ormust use this key to perform either the direct or indirect authenticatedread protocol see Section 5.3 on page 604). 2 (reserved) Currentlyunused. 3-n (reserved) Currently unused.

[5100] The functions required in the CONSUMABLE_QA are:

[5101] Random, ReplaceKey, to allow key programming & substitution

[5102] Read, to allow reads of data.

[5103] Write, to allow updates of M₁₊ during manufacture

[5104] WriteAuth, to provide a means of updating the M₀ data (consumableremaining)

[5105] SetPermissions, to provide a means of updating write permissions

[5106] Authentication of Consumables

[5107] 1 Introduction

[5108] Manufacturers of systems that require consumables (such as alaser printer that requires toner cartridges) have struggled with theproblem of authenticating consumables, to varying levels of success.Most have resorted to specialized packaging that involves a patent.However this does not stop home refill operations or clone manufacturein countries with weak industrial property protection. The prevention ofcopying is important to prevent poorly manufactured substituteconsumables from damaging the base system. For example, poorly filteredink may clog print nozzles in an ink jet printer, causing the consumerto blame the system manufacturer and not admit the use of non-authorizedconsumables.

[5109] To solve the authentication problem, this document describes anQA Chip that contains authentication keys and circuitry speciallydesigned to prevent copying. The chip is manufactured using the standardFlash memory manufacturing process, and is low cost enough to beincluded in consumables such as ink and toner cartridges. Theimplementation is approximately 1 mm² in a 0.25 micron flash process,and has an expected manufacturing cost of approximately 10 cents in2003.

[5110] 2 NSA

[5111] Once programmed, the QA Chips as described here are compliantwith the NSA export guidelines since they do not constitute a strongencryption device. They can therefore be practically manufactured in theUSA (and exported) or anywhere else in the world.

[5112] 3 Nomenclature

[5113] The following symbolic nomenclature is used throughout thisdocument: TABLE 231 Summary of symbolic nomenclature Symbol DescriptionF[X] Function F, taking a single parameter X F[X, Y] Function F, takingtwo parameters, X and Y X | Y X concatenated with Y X

Y Bitwise X AND Y X

Y Bitwise X OR Y (inclusive-OR) X ⊕ Y Bitwise X XOR Y (exclusive-OR)

X Bitwise NOT X (complement) X

Y X is assigned the value Y X

{Y, Z} The domain of assignment inputs to X is Y and Z X = Y X is equalto Y X ≠ Y X is not equal to Y

X Decrement X by 1 (floor 0)

X Increment X by 1 (modulo register length) Erase X Erase Flash memoryregister X SetBits[X, Y] Set the bits of the Flash memory register Xbased on Y Z

ShiftRight[X, Y] Shift register X right one bit position, taking inputbit from Y and placing the output bit in Z

[5114] 4 Pseudocode

[5115] 4.1.1 Asynchronous

[5116] The following pseudocode:

[5117] var=expression

[5118] means the var signal or output is equal to the evaluation of theexpression.

[5119] 4.1.2 Synchronous

[5120] The following pseudocode:

[5121] var←expression

[5122] means the var register is assigned the result of evaluating theexpression during this cycle.

[5123] 4.1.3 Expression

[5124] Expressions are defined using the nomenclature in Table 231above. Therefore:

[5125] var=(a=b)

[5126] is interpreted as the var signal is 1 if a is equal to b, and 0otherwise.

[5127] 4.2 Diagrams

[5128] Black is used to denote data, and red to denote 1-bitcontrol-signal lines.

[5129] 4.3 QA Chip Terminology

[5130] This document refers to QA Chips by their function in particularprotocols:

[5131] For authenticated reads, ChipA is the QA Chip beingauthenticated, and ChipT is the QA Chip that is trusted.

[5132] For replacement of keys, ChipP is the QA Chip being programmedwith the new key, and ChipF is the factory QA Chip that generates themessage to program the new key.

[5133] For upgrades of data in a QA Chip, ChipU is the QA Chip beingupgraded, and ChipS is the QA Chip that signs the upgrade value.

[5134] Any given physical QA Chip will contain functionality that allowsit to operate as an entity in some number of these protocols.

[5135] Therefore, wherever the terms ChipA, ChipT, ChipP, ChipF, ChipUand ChipS are used in this document, they are referring to logicalentities involved in an authentication protocol as defined in subsequentsections.

[5136] Physical QA Chips are referred to by their location. For example,each ink cartridge may contain a QA Chip referred to as an INK_QA, withall INK_QA chips being on the same physical bus. In the same way, the QAChip inside a printer is referred to as PRINTER_QA, and will be on aseparate bus to the INK_QA chips.

[5137] 5 Concepts and Terms

[5138] This chapter provides a background to the problem ofauthenticating consumables. For more in-depth introductory texts, see[12], [78], and [56].

[5139] 5.1 Basic Terms

[5140] A message, denoted by M, is plaintext. The process oftransforming M into ciphertext C, where the substance of M is hidden, iscalled encryption. The process of transforming C back into M is calleddecryption. Referring to the encryption function as E, and thedecryption function as D, we have the following identities:

E[M]=C

D[C]=M

[5141] Therefore the following identity is true:

D[E[M]]=M

[5142] 5.2 Symmetric Cryptography

[5143] A symmetric encryption algorithm is one where:

[5144] the encryption function E relies on key K₁,

[5145] the decryption function D relies on key K₂,

[5146] K₂ can be derived from K₁, and

[5147] K₁ can be derived from K₂.

[5148] In most symmetric algorithms, K₁ equals K₂. However, even if K₁does not equal K₂, given that one key can be derived from the other, asingle key K can suffice for the mathematical definition. Thus:

E_(K)[M]=C

D_(K)[C]=M

[5149] The security of these algorithms rests very much in the key K.Knowledge of K allows anyone to encrypt or decrypt. Consequently K mustremain a secret for the duration of the value of M. For example, M maybe a wartime message “My current position is grid position 123-456”.Once the war is over the value of M is greatly reduced, and if K is madepublic, the knowledge of the combat unit's position may be of norelevance whatsoever. Of course if it is politically sensitive for thecombat unit's position to be known even after the war, K may have toremain secret for a very long time.

[5150] An enormous variety of symmetric algorithms exist, from thetextbooks of ancient history through to sophisticated modern algorithms.Many of these are insecure, in that modern cryptanalysis techniques (seeSection 5.7 on page 646) can successfully attack the algorithm to theextent that K can be derived.

[5151] The security of the particular symmetric algorithm is a functionof two things: the strength of the algorithm and the length of the key[78].

[5152] The strength of an algorithm is difficult to quantify, relying onits resistance to cryptographic attacks (see Section 5.7 on page 646).In addition, the longer that an algorithm has remained in the publiceye, and yet remained unbroken in the midst of intense scrutiny, themore secure the algorithm is likely to be. By contrast, a secretalgorithm that has not been scrutinized by cryptographic experts isunlikely to be secure.

[5153] Even if the algorithm is “perfectly” strong (the only way tobreak it is to try every key—see Section 5.7.1.5 on page 647),eventually the right key will be found. However, the more keys thereare, the more keys have to be tried. If there are N keys, it will take amaximum of N tries. If the key is N bits long, it will take a maximum of2^(N) tries, with a 50% chance of finding the key after only half theattempts (2^(N−1)). The longer N becomes, the longer it will take tofind the key, and hence the more secure it is. What makes a good keylength depends on the value of the secret and the time for which thesecret must remain secret as well as available computing resources.

[5154] In 1996, an ad hoc group of world-renowned cryptographers andcomputer scientists released a report [9] describing minimal key lengthsfor symmetric ciphers to provide adequate commercial security. Theysuggest an absolute minimum key length of 90 bits in order to protectdata for 20 years, and stress that increasingly, as cryptosystemssuccumb to smarter attacks than brute-force key search, even more bitsmay be required to account for future surprises in cryptanalysistechniques.

[5155] We will ignore most historical symmetric algorithms on thegrounds that they are insecure, especially given modern computingtechnology. Instead, we will discuss the following algorithms:

[5156] DES

[5157] Blowfish

[5158] RC5

[5159] IDEA

[5160] 5.2.1 DES

[5161] DES (Data Encryption Standard) [26] is a US and internationalstandard, where the same key is used to encrypt and decrypt. The keylength is 56 bits. It has been implemented in hardware and software,although the original design was for hardware only. The originalalgorithm used in DES was patented in 1976 (U.S. Pat. No. 3,962,539) andhas since expired.

[5162] During the design of DES, the NSA (National Security Agency)provided secret S-boxes to perform the key-dependent nonlineartransformations of the data block. After differential cryptanalysis wasdiscovered outside the NSA, it was revealed that the DES S-boxes werespecifically designed to be resistant to differential cryptanalysis.

[5163] As described in [95], using 1993 technology, a 56-bit DES key canbe recovered by a custom-designed $1 million machine performing a bruteforce attack in only 35 minutes. For $10 million, the key can berecovered in only 3.5 minutes. DES is clearly not secure now, and willbecome less so in the future.

[5164] A variant of DES, called triple-DES is more secure, but requires3 keys: K₁, K₂, and K₃. The keys are used in the following manner:

E_(K3) [D_(K2)[E_(K1)[M]]]=C

D_(K3)[E_(K2)[D_(K1[C]]]=M)

[5165] The main advantage of triple-DES is that existing DESimplementations can be used to give more security than single key DES.Specifically, triple-DES gives protection of equivalent key length of112 bits [78]. Triple-DES does not give the equivalent protection of a168-bit key (3×56) as one might naively expect.

[5166] Equipment that performs triple-DES decoding and/or encodingcannot be exported from the United States.

[5167] 5.2.2 Blowfish

[5168] Blowfish is a symmetric block cipher first presented by Schneierin 1994 [76]. It takes a variable length key, from 32 bits to 448 bits,is unpatented, and is both license and royalty free. In addition, it ismuch faster than DES.

[5169] The Blowfish algorithm consists of two parts: a key-expansionpart and a data-encryption part. Key expansion converts a key of at most448 bits into several subkey arrays totaling 4168 bytes. Data encryptionoccurs via a 16-round Feistel network. All operations are XORs andadditions on 32-bit words, with four index array lookups per round.

[5170] It should be noted that decryption is the same as encryptionexcept that the subkey arrays are used in the reverse order. Complexityof implementation is therefore reduced compared to other algorithms thatdo not have such symmetry.

[5171] [77] describes the published attacks which have been mounted onBlowfish, although the algorithm remains secure as of February 1998[79]. The major finding with these attacks has been the discovery ofcertain weak keys. These weak keys can be tested for during keygeneration. For more information, refer to [77] and [79].

[5172] 5.2.3 RC5

[5173] Designed by Ron Rivest in 1995, RC5 [74] has a variable blocksize, key size, and number of rounds. Typically, however, it uses a64-bit block size and a 128-bit key.

[5174] The RC5 algorithm consists of two parts: a key-expansion part anda data-encryption part. Key expansion converts a key into 2r+2 subkeys(where r=the number of rounds), each subkey being w bits. For a 64-bitblocksize with 16 rounds (w=32, r=16), the subkey arrays total 136bytes. Data encryption uses addition mod 2^(W), XOR and bitwiserotation.

[5175] An initial examination by Kaliski and Yin [43] suggested thatstandard linear and differential cryptanalysis appeared impractical forthe 64-bit blocksize version of the algorithm. Their differentialattacks on 9 and 12 round RC5 require 2⁴⁵ and 2⁶² chosen plaintextsrespectively, while the linear attacks on 4, 5, and 6 round RC5 requires2³⁷, 2⁴⁷ and 2⁵⁷ known plaintexts). These two attacks are independent ofkey size.

[5176] More recently however, Knudsen and Meier [47] described a newtype of differential attack on RC5 that improved the earlier results bya factor of 128, showing that RC5 has certain weak keys.

[5177] RC5 is protected by multiple patents owned by RSA Laboratories. Alicense must be obtained to use it.

[5178] 5.2.4 IDEA

[5179] Developed in 1990 by Lai and Massey [53], the first incarnationof the IDEA cipher was called PES. After differential cryptanalysis wasdiscovered by Biham and Shamir in 1991, the algorithm was strengthened,with the result being published in 1992 as IDEA [52].

[5180] IDEA uses 128-bit keys to operate on 64-bit plaintext blocks. Thesame algorithm is used for encryption and decryption. It is generallyregarded as the most secure block algorithm available today [78][78].

[5181] The biggest drawback of IDEA is the fact that it is patented(U.S. Pat. No. 5,214,703, issued in 1993), and a license must beobtained from Ascom Tech AG (Bern) to use it.

[5182] 5.3 Asymmetric Cryptography

[5183] An asymmetric encryption algorithm is one where:

[5184] the encryption function E relies on key K₁,

[5185] the decryption function D relies on key K₂,

[5186] K₂ cannot be derived from K₁ in a reasonable amount of time, and

[5187] K₁ cannot be derived from K₂ in a reasonable amount of time.

[5188] Thus:

E_(K1)[M]=C

D_(K2)[C]=M

[5189] These algorithms are also called public-key because one key K₁can be made public. Thus anyone can encrypt a message (using K₁) butonly the person with the corresponding decryption key (K₂) can decryptand thus read the message.

[5190] In most cases, the following identity also holds:

E_(K2)[M]=C

D_(K1)[C]=M

[5191] This identity is very important because it implies that anyonewith the public key K₁ can see M and know that it came from the owner ofK₂. No-one else could have generated C because to do so would implyknowledge of K₂. This gives rise to a different application, unrelatedto encryption—digital signatures.

[5192] The property of not being able to derive K₁ from K₂ and viceversa in a reasonable time is of course clouded by the concept ofreasonable time. What has been demonstrated time after time, is that acalculation that was thought to require a long time has been madepossible by the introduction of faster computers, new algorithms etc.The security of asymmetric algorithms is based on the difficulty of oneof two problems: factoring large numbers (more specifically largenumbers that are the product of two large primes), and the difficulty ofcalculating discrete logarithms in a finite field. Factoring largenumbers is conjectured to be a hard problem given today's understandingof mathematics. The problem however, is that factoring is getting easiermuch faster than anticipated. Ron Rivest in 1977 said that factoring a125-digit number would take 40 quadrillion years [30]. In 1994 a129-digit number was factored (3]. According to Schneier, you need a1024-bit number to get the level of security today that you got from a512-bit number in the 1980s [78]. If the key is to last for some yearsthen 1024 bits may not even be enough. Rivest revised his key lengthestimates in 1990: he suggests 1628 bits for high security lasting until2005, and 1884 bits for high security lasting until 2015 [69]. Schneiersuggests 2048 bits are required in order to protect against corporationsand governments until 2015 [80].

[5193] Public key cryptography was invented in 1976 by Diffie andHellman [15][15], and independently by Merkle [57]. Although Diffie,Hellman and Merkle patented the concepts (U.S. Pat. Nos. 4,200,770 and4,218,582), these patents expired in 1997.

[5194] A number of public key cryptographic algorithms exist. Most areimpractical to implement, and many generate a very large C for a given Mor require enormous keys. Still others, while secure, are far too slowto be practical for several years. Because of this, many public keysystems are hybrid—a public key mechanism is used to transmit asymmetric session key, and then the session key is used for the actualmessages.

[5195] All of the algorithms have a problem in terms of key selection. Arandom number is simply not secure enough. The two large primes p and qmust be chosen carefully—there are certain weak combinations that can befactored more easily (some of the weak keys can be tested for). Butnonetheless, key selection is not a simple matter of randomly selecting1024 bits for example. Consequently the key selection process must alsobe secure.

[5196] Of the practical algorithms in use under public scrutiny, thefollowing are discussed:

[5197] RSA

[5198] DSA

[5199] EIGamal

[5200] 5.3.1 RSA

[5201] The RSA cryptosystem [75], named after Rivest, Shamir, andAdleman, is the most widely used public key cryptosystem, and is a defacto standard in much of the world [78].

[5202] The security of RSA depends on the conjectured difficulty offactoring large numbers that are the product of two primes (p and q).There are a number of restrictions on the generation of p and q. Theyshould both be large, with a similar number of bits, yet not be close toone another (otherwise p≡q≡{square root}pq). In addition, many authorshave suggested that p and q should be strong primes [56]. TheHellman-Bach patent (U.S. Pat. No. 4,633,036) covers a method forgenerating strong RSA primes p and q such that n=pq and factoring n isbelieved to be computationally infeasible. The RSA algorithm patent wasissued in 1983 (U.S. Pat. No. 4,405,829). The patent expires on Sep. 20,2000.

[5203] 5.3.2 DSA

[5204] DSA (Digital Signature Algorithm) is an algorithm designed aspart of the Digital Signature Standard (DSS) [29]. As defined, it cannotbe used for generalized encryption. In addition, compared to RSA, DSA is10 to 40 times slower for signature verification [40]. DSA explicitlyuses the SHA-1 hashing algorithm (see Section 5.5.3.3 on page 640).

[5205] DSA key generation relies on finding two primes p and q such thatq divides p−1. According to Schneier [78], a 1024-bit p value isrequired for long term DSA security. However the DSA standard [29] doesnot permit values of p larger than 1024 bits (p must also be a multipleof 64 bits).

[5206] The US Government owns the DSA algorithm and has at least onerelevant patent (U.S. Pat. No. 5,231,688 granted in 1993). However,according to NIST [61]:

[5207] “The DSA patent and any foreign counterparts that may issue areavailable for use without any written permission from or any payment ofroyalties to the U.S. government.”

[5208] In a much stronger declaration, NIST states in the same document[61] that DSA does not infringe third party's rights:

[5209] “NIST reviewed all of the asserted patents and concluded thatnone of them would be infringed by DSS. Extra protection will be writteninto the PK1 pilot project that will prevent an organization orindividual from suing anyone except the government for patentinfringement during the course of the project.”

[5210] It must however, be noted that the Schnorr authenticationalgorithm [81] (U.S. Pat. No. 4,995,082) patent holder claims that DSAinfringes his patent. The Schnorr patent is not due to expire until2008.

[5211] 5.3.3 EIGamal

[5212] The EIGamal scheme [22][22] is used for both encryption anddigital signatures. The security is based on the conjectured difficultyof calculating discrete logarithms in a finite field.

[5213] Key selection involves the selection of a prime p, and two randomnumbers g and x such that both g and x are less than p. Then calculatey=gx mod p. The public key is y, g, and p. The private key is x.

[5214] EIGamal is unpatented. Although it uses the patentedDiffie-Hellman public key algorithm [15][15], those patents expired in1997. EIGamal public key encryption and digital signatures can now besafely used without infringing third party patents.

[5215] 5.4 Cryptographic Challenge-Response Protocols and Zero KnowledgeProofs

[5216] The general principle of a challenge-response protocol is toprovide identity authentication. The simplest form of challenge-responsetakes the form of a secret password. A asks B for the secret password,and if B responds with the correct password, A declares B authentic.

[5217] There are three main problems with this kind of simplisticprotocol. Firstly, once B has responded with the password, any observerC will know what the password is. Secondly, A must know the password inorder to verify it. Thirdly, if C impersonates A, then B will give thepassword to C (thinking C was A), thus compromising the password.

[5218] Using a copyright text (such as a haiku) as the password is notsufficient, because we are assuming that anyone is able to copy thepassword (for example in a country where intellectual property is notrespected).

[5219] The idea of cryptographic challenge-response protocols is thatone entity (the claimant) proves its identity to another (the verifier)by demonstrating knowledge of a secret known to be associated with thatentity, without revealing the secret itself to the verifier during theprotocol [56]. In the generalized case of cryptographicchallenge-response protocols, with some schemes the verifier knows thesecret, while in others the secret is not even known by the verifier. Agood overview of these protocols can be found in [25], [78], and [56].

[5220] Since this documentation specifically concerns Authentication,the actual cryptographic challenge-response protocols used forauthentication are detailed in the appropriate sections. However theconcept of Zero Knowledge Proofs bears mentioning here.

[5221] The Zero Knowledge Proof protocol, first described by Feige, Fiatand Shamir in [24] is extensively used in Smart Cards for the purpose ofauthentication [34][34][34]. The protocol's effectiveness is based onthe assumption that it is computationally infeasible to compute squareroots modulo a large composite integer with unknown factorization. Thisis provably equivalent to the assumption that factoring large integersis difficult.

[5222] It should be noted that there is no need for the claimant to havesignificant computing power. Smart cards implement this kind ofauthentication using only a few modulo multiplications [34][34].

[5223] Finally, it should be noted that the Zero Knowledge Proofprotocol is patented [82] (U.S. Pat. No. 4,748,668, issued May 31,1988).

[5224] 5.5 One-Way Functions

[5225] A one-way function F operates on an input X, and returns F[X]such that X cannot be determined from F[X]. When there is no restrictionon the format of X, and F[X] contains fewer bits than X, then collisionsmust exist. A collision is defined as two different X input valuesproducing the same F[X] value—i.e. X₁ and X₂ exist such that X₁≠X₂ yetF[X₁]=F[X₂].

[5226] When X contains more bits than F[X], the input must be compressedin some way to create the output. In many cases, X is broken into blocksof a particular size, and compressed over a number of rounds, with theoutput of one round being the input to the next. The output of the hashfunction is the last output once X has been consumed. A pseudo-collisionof the compression function CF is defined as two different initialvalues V₁ and V₂ and two inputs X₁ and X₂ (possibly identical) are givensuch that CF(V₁, X₁)=CF(V₂, X₂). Note that the existence of apseudo-collision does not mean that it is easy to compute an X₂ for agiven X₁.

[5227] We are only interested in one-way functions that are fast tocompute. In addition, we are only interested in deterministic one-wayfunctions that are repeatable in different implementations.

[5228] Consider an example F where F[X] is the time between calls to F.For a given F(X] X cannot be determined because X is not even used by F.However the output from F will be different for differentimplementations. This kind of F is therefore not of interest.

[5229] In the scope of this document, we are interested in the followingforms of one-way functions:

[5230] Encryption using an unknown key

[5231] Random number sequences

[5232] Hash Functions

[5233] Message Authentication Codes

[5234] 5.5.1 Encryption Using an Unknown Key

[5235] When a message is encrypted using an unknown key K₁ theencryption function E is effectively one-way. Without the key, it iscomputationally infeasible to obtain M from EK[M] without K. Anencryption function is only one-way for as long as the key remainshidden.

[5236] An encryption algorithm does not create collisions, since Ecreates EK[M] such that it is possible to reconstruct M using functionD. Consequently F[X] contains at least as many bits as X (no informationis lost) if the one-way function F is E.

[5237] Symmetric encryption algorithms (see Section 5.2 on page 629)have the advantage over asymmetric algorithms (see Section 5.3 on page632) for producing one-way functions based on encryption for thefollowing reasons:

[5238] The key for a given strength encryption algorithm is shorter fora symmetric algorithm than an asymmetric algorithm

[5239] Symmetric algorithms are faster to compute and require lesssoftware or silicon

[5240] Note however, that the selection of a good key depends on theencryption algorithm chosen. Certain keys are not strong for particularencryption algorithms, so any key needs to be tested for strength. Themore tests that need to be performed for key selection, the less likelythe key will remain hidden.

[5241] 5.5.2 Random Number Sequences

[5242] Consider a random number sequence R₀, R₁, . . . , R_(i), R_(i+1).We define the one-way function F such that F[X] returns the X^(th)random number in the random sequence. However we must ensure that F[X]is repeatable for a given X on different implementations. The randomnumber sequence therefore cannot be truly random. Instead, it must bepseudo-random, with the generator making use of a specific seed.

[5243] There are a large number of issues concerned with defining goodrandom number generators. Knuth, in [48] describes what makes agenerator “good” (including statistical tests), and the general problemsassociated with constructing them. Moreau gives a high level survey ofthe current state of the field in [60].

[5244] The majority of random number generators produce the i^(th)random number from the i−1 ^(th) state—the only way to determine thei^(th) number is to iterate from the 0^(th) number to the i^(th). If iis large, it may not be practical to wait for i iterations.

[5245] However there is a type of random number generator that doesallow random access. In [10], Blum, Blum and Shub define the idealgenerator as follows: “ . . . we would like a pseudo-random sequencegenerator to quickly produce, from short seeds, long sequences (of bits)that appear in every way to be generated by successive flips of a faircoin”. They defined the x² mod n generator [10], more commonly referredto as the BBS generator. They showed that given certain assumptions uponwhich modern cryptography relies, a BBS generator passes extremelystringent statistical tests.

[5246] The BBS generator relies on selecting n which is a Blum integer(n=pq where p and q are large prime numbers, p≠q, p mod 4=3, and q mod4=3). The initial state of the generator is given by x₀ where x₀=x² modn, and x is a random integer relatively prime to n. The i^(th)pseudo-random bit is the least significant bit of x_(i) where:

x _(i) =x _(i−1) ² mod n

[5247] As an extra property, knowledge of p and q allows a directcalculation of the i^(th) number in the sequence as follows:

x _(i) =x ₀ ^(y) mod n where y=2^(i) mod((p−1)(q−1))

[5248] Without knowledge of p and q, the generator must iterate (thesecurity of calculation relies on the conjectured difficulty offactoring large numbers).

[5249] When first defined, the primary problem with the BBS generatorwas the amount of work required for a single output bit. The algorithmwas considered too slow for most applications. However the advent ofMontgomery reduction arithmetic [58] has given rise to more practicalimplementations, such as [59]. In addition, Vazirani and Vazirani haveshown in [93] that depending on the size of n, more bits can safely betaken from x_(i) without compromising the security of the generator.

[5250] Assuming we only take 1 bit per x_(i), N bits (and hence Niterations of the bit generator function) are needed in order togenerate an N-bit random number. To the outside observer, given aparticular set of bits, there is no way to determine the next bit otherthan a 50/50 probability. If the x, p and q are hidden, they act as akey, and it is computationally infeasible to take an output bit streamand compute x, p, and q. It is also computationally infeasible todetermine the value of i used to generate a given set of pseudo-randombits. This last feature makes the generator one-way.

[5251] Different values of i can produce identical bit sequences of agiven length (e.g. 32 bits of random bits). Even if x, p and q areknown, for a given F[i], i can only be derived as a set ofpossibilities, not as a certain value (of course if the domain of i isknown, then the set of possibilities is reduced further).

[5252] However, there are problems in selecting a good p and q, and agood seed x. In particular, Ritter in [68] describes a problem inselecting x. The nature of the problem is that a BBS generator does notcreate a single cycle of known length. Instead, it creates cycles ofvarious lengths, including degenerate (zero-length) cycles. Thus a BBSgenerator cannot be initialized with a random state—it might be on ashort cycle. Specific algorithms exist in section 9 of [10] to determinethe length of the period for a given seed given certain strenuousconditions for n.

[5253] 5.5.3 Hash Functions

[5254] Special one-way functions, known as Hash functions, map arbitrarylength messages to fixed-length hash values. Hash functions are referredto as H[M]. Since the input is of arbitrary length, a hash function hasa compression component in order to produce a fixed length output. Hashfunctions also have an obfuscation component in order to make itdifficult to find collisions and to determine information about M fromH[M].

[5255] Because collisions do exist, most applications require that thehash algorithm is preimage resistant, in that for a given X₁ it isdifficult to find X₂ such that H[X₁]=H[X₂]. In addition, mostapplications also require the hash algorithm to be collision resistant(i.e. it should be hard to find two messages X₁ and X₂ such thatH[X₁]=H[X₂]). However, as described in [20], it is an open problemwhether a collision-resistant hash function, in the ideal sense, canexist at all.

[5256] The primary application for hash functions is in the reduction ofan input message into a digital “fingerprint” before the application ofa digital signature algorithm. One problem of collisions with digitalsignatures can be seen in the following example.

[5257] A has a long message M₁ that says “I owe B $10”. A signs H[M₁]using his private key. B, being greedy, then searches for a collisionmessage M₂ where H[M₂]=H[M₁] but where M₂ is favorable to B, for example“I owe B $1 million”. Clearly it is in A's interest to ensure that it isdifficult to find such an M₂.

[5258] Examples of collision resistant one-way hash functions are SHA-1[28], MD5 [73] and RIPEMD-160 [66], all derived from MD4 [70][70].

[5259] 5.5.3.1 MD4

[5260] Ron Rivest introduced MD4 [70][70] in 1990. It is only mentionedhere because all other one-way hash functions are derived in some wayfrom MD4.

[5261] MD4 is now considered completely broken [18][18] in thatcollisions can be calculated instead of searched for. In the exampleabove, B could trivially generate a substitute message M₂ with the samehash value as the original message M₁.

[5262] 5.5.3.2 MD5

[5263] Ron Rivest introduced MD5 [73] in 1991 as a more secure MD4. LikeMD4, MD5 produces a 128-bit hash value. MD5 is not patented [80].

[5264] Dobbertin describes the status of MD5 after recent attacks [20].He describes how pseudo-collisions have been found in MD5, indicating aweakness in the compression function, and more recently, collisions havebeen found. This means that MD5 should not be used for compression indigital signature schemes where the existence of collisions may havedire consequences. However MD5 can still be used as a one-way function.In addition, the HMAC-MD5 construct (see Section 5.5.4.1 on page 643) isnot affected by these recent attacks.

[5265] 5.5.3.3 SHA-1

[5266] SHA-1 [28] is very similar to MD5, but has a 160-bit hash value(MD5 only has 128 bits of hash value). SHA-1 was designed and introducedby the NIST and NSA for use in the Digital Signature Standard (DSS). Theoriginal published description was called SHA [27], but very soonafterwards, was revised to become SHA-1 [28], supposedly to correct asecurity flaw in SHA (although the NSA has not released the mathematicalreasoning behind the change).

[5267] There are no known cryptographic attacks against SHA-1 [78]. Itis also more resistant to brute force attacks than MD4 or MD5 simplybecause of the longer hash result.

[5268] The US Government owns the SHA-1 and DSA algorithms (a digitalsignature authentication algorithm defined as part of DSS [29]) and hasat least one relevant patent (U.S. Pat. No. 5,231,688 granted in 1993).However, according to NIST [61]:

[5269] “The DSA patent and any foreign counterparts that may issue areavailable for use without any written permission from or any payment ofroyalties to the U.S. government.”

[5270] In a much stronger declaration, NIST states in the same document[61] that DSA and SHA-1 do not infringe third party's rights:

[5271] “NIST reviewed all of the asserted patents and concluded thatnone of them would be infringed by DSS. Extra protection will be writteninto the PK1 pilot project that will prevent an organization orindividual from suing anyone except the government for patentinfringement during the course of the project.”

[5272] It must however, be noted that the Schnorr authenticationalgorithm [81] (U.S. Pat. No. 4,995,082) patent holder claims that DSAinfringes his patent. The Schnorr patent is not due to 20 expire until2008. Fortunately this does not affect SHA-1.

[5273] 5.5.3.4 RIPEMD-160

[5274] RIPEMD-160 [66] is a hash function derived from its predecessorRIPEMD [11] (developed for the European Community's RIPE project in1992). As its name suggests, RIPEMD-160 produces a 160-bit hash result.Tuned for software implementations on 32-bit architectures, RIPEMD-160is intended to provide a high level of security for 10 years or more.

[5275] Although there have been no successful attacks on RIPEMD-160, itis comparatively new and has not been extensively cryptanalyzed. Theoriginal RIPEMD algorithm [11] was specifically designed to resist knowncryptographic attacks on MD4. The recent attacks on MD5 (detailed in[20]) showed similar weaknesses in the RIPEMD 128-bit hash function.Although the attacks showed only theoretical weaknesses, Dobbertin,Preneel and Bosselaers further strengthened RIPEMD into a new algorithmRIPEMD-160.

[5276] RIPEMD-160 is in the public domain, and requires no licensing orroyalty payments.

[5277] 5.5.4 Message Authentication Codes

[5278] The problem of message authentication can be summed up asfollows:

[5279] How can A be sure that a message supposedly from B is in factfrom B?

[5280] Message authentication is different from entity authentication(described in the section on cryptographic challenge-responseprotocols). With entity authentication, one entity (the claimant) provesits identity to another (the verifier). With message authentication, weare concerned with making sure that a given message is from who we thinkit is from i.e. it has not been tampered with en route from the sourceto its destination. While this section has a brief overview of messageauthentication, a more detailed survey can be found in [88].

[5281] A one-way hash function is not sufficient protection for amessage. Hash functions such as MD5 rely on generating a hash value thatis representative of the original input, and the original input cannotbe derived from the hash value. A simple attack by E, who is in-betweenA and B, is to intercept the message from B, and substitute his own.Even if A also sends a hash of the original message, E can simplysubstitute the hash of his new message. Using a one-way hash functionalone, A has no way of knowing that B's message has been changed.

[5282] One solution to the problem of message authentication is theMessage Authentication Code, or MAC.

[5283] When B sends message M, it also sends MAC(M] so that the receiverwill know that M is actually from B. For this to be possible, only Bmust be able to produce a MAC of M, and in addition, A should be able toverify M against MAC[M]. Notice that this is different from encryptionof M−MACs are useful when M does not have to be secret.

[5284] The simplest method of constructing a MAC from a hash function isto encrypt the hash value with a symmetric algorithm:

[5285] 1. Hash the input message H[M]

[5286] 2. Encrypt the hash E_(K)[H[M]]

[5287] This is more secure than first encrypting the message and thenhashing the encrypted message. Any symmetric or asymmetric cryptographicfunction can be used, with the appropriate advantages and disadvantageof each type described in Section 5.2 on page 629 and Section 5.3 onpage 632.

[5288] However, there are advantages to using a key-dependent one-wayhash function instead of techniques that use encryption (such as thatshown above):

[5289] Speed, because one-way hash functions in general work much fasterthan encryption;

[5290] Message size, because EK[M] is at least the same size as M, whileH[M] is a fixed size (usually considerably smaller than M);

[5291] Hardware/software requirements—keyed one-way hash functions aretypically far less complex than their encryption-based counterparts; and

[5292] One-way hash function implementations are not considered to beencryption or decryption devices and therefore are not subject to USexport controls.

[5293] It should be noted that hash functions were never originallydesigned to contain a key or to support message authentication. As aresult, some ad hoc methods of using hash functions to perform messageauthentication, including various functions that concatenate messageswith secret prefixes, suffixes, or both have been proposed [56][56].Most of these ad hoc methods have been successfully attacked bysophisticated means [42][42][42]. Additional MACs have been suggestedbased on XOR schemes [8] and Toeplitz matrices [49] (including thespecial case of LFSR-based (Linear Feed Shift Register) constructions).

[5294] 5.5.4.1 HMAC

[5295] The HMAC construction [6][6] in particular is gaining acceptanceas a solution for Internet message authentication security protocols.The HMAC construction acts as a wrapper, using the underlying hashfunction in a black-box way. Replacement of the hash function isstraightforward if desired due to security or performance reasons.However, the major advantage of the HMAC construct is that it can beproven secure provided the underlying hash function has some reasonablecryptographic strengths—that is, HMAC's strengths are directly connectedto the strength of the hash function [6].

[5296] Since the HMAC construct is a wrapper, any iterative hashfunction can be used in an HMAC. Examples include HMAC-MD5, HMAC-SHA1,HMAC-RIPEMD160 etc.

[5297] Given the following definitions:

[5298] H=the hash function (e.g. MD5 or SHA-1)

[5299] n=number of bits output from H (e.g. 160 for SHA-1, 128 bits forMD5)

[5300] M=the data to which the MAC function is to be applied

[5301] K=the secret key shared by the two parties

[5302] ipad=0x36 repeated 64 times

[5303] opad=0x5C repeated 64 times

[5304] The HMAC algorithm is as follows:

[5305] 1. Extend K to 64 bytes by appending 0x00 bytes to the end of K

[5306] 2. XOR the 64 byte string created in (1) with ipad

[5307] 3. append data stream M to the 64 byte string created in (2)

[5308] 4. Apply H to the stream generated in (3)

[5309] 5. XOR the 64 byte string created in (1) with opad

[5310] 6. Append the H result from (4) to the 64 byte string resultingfrom (5)

[5311] 7. Apply H to the output of (6) and output the result

[5312] Thus:

HMAC[M]=H[(K ⊕ opad)|H[(K ⊕ ipad)|M]]

[5313] The recommended key length is at least n bits, although it shouldnot be longer than 64 bytes (the length of the hashing block). A keylonger than n bits does not add to the security of the function.

[5314] HMAC optionally allows truncation of the final output e.g.truncation to 128 bits from 160 bits.

[5315] The HMAC designers' Request for Comments [51] was issued in 1997,one year after the algorithm was first introduced. The designers claimedthat the strongest known attack against HMAC is based on the frequencyof collisions for the hash function H (see Section 14.10 on page 700),and is totally impractical for minimally reasonable hash functions:

[5316] As an example, if we consider a hash function like MD5 where theoutput length is 128 bits, the attacker needs to acquire the correctmessage authentication tags computed (with the same secret key K) onabout 2⁶⁴ known plaintexts. This would require the processing of atleast 2⁶⁴ blocks under H, an impossible task in any realistic scenario(for a block length of 64 bytes this would take 250,000 years in acontinuous 1 Gbps link, and without changing the secret key K all thistime). This attack could become realistic only if serious flaws in thecollision behavior of the function H are discovered (e.g. Collisionsfound after 2³⁰ messages). Such a discovery would determine theimmediate replacement of function H (the effects of such a failure wouldbe far more severe for the traditional uses of H in the context ofdigital signatures, public key certificates etc).

[5317] Of course, if a 160-bit hash function is used, then 2⁶⁴ should bereplaced with 2⁸⁰.

[5318] This should be contrasted with a regular collision attack oncryptographic hash functions where no secret key is involved and 2⁶⁴off-line parallelizable operations suffice to find collisions.

[5319] More recently, HMAC protocols with replay prevention components[62] have been defined in order to prevent the capture and replay of anyM, HMAC[M] combination within a given time period.

[5320] Finally, it should be noted that HMAC is in the public domain[50], and incurs no licensing fees. There are no known patents infringedby HMAC.

[5321] 5.6 Random Numbers and Time Varying Messages

[5322] The use of a random number generator as a one-way function hasalready been examined. However, random number generator theory is verymuch intertwined with cryptography, security, and authentication.

[5323] There are a large number of issues concerned with defining goodrandom number generators. Knuth, in [48] describes what makes agenerator good (including statistical tests), and the general problemsassociated with constructing them. Moreau gives a high level survey ofthe current state of the field in [60].

[5324] One of the uses for random numbers is to ensure that messagesvary over time. Consider a system where A encrypts commands and sendsthem to B. If the encryption algorithm produces the same output for agiven input, an attacker could simply record the messages and play themback to fool B. There is no need for the attacker to crack theencryption mechanism other than to know which message to play to B(while pretending to be A). Consequently messages often include a randomnumber and a time stamp to ensure that the message (and hence itsencrypted counterpart) varies each time.

[5325] Random number generators are also often used to generate keys.Although Klapper has recently shown [45] that a family of securefeedback registers for the purposes of building key-streams does exist,he does not give any practical construction. It is therefore best to sayat the moment that all generators are insecure for this purpose. Forexample, the Berlekamp-Massey algorithm [54], is a classic attack on anLFSR random number generator. If the LFSR is of length n, then only 2nbits of the sequence suffice to determine the LFSR, compromising the keygenerator.

[5326] If, however, the only role of the random number generator is tomake sure that messages vary over time, the security of the generatorand seed is not as important as it is for session key generation. Ifhowever, the random number seed generator is compromised, and anattacker is able to calculate future “random” numbers, it can leave someprotocols open to attack. Any new protocol should be examined withrespect to this situation.

[5327] The actual type of random number generator required will dependupon the implementation and the purposes for which the generator isused. Generators include Blum, Blum, and Shub [10], stream ciphers suchas RC4 by Ron Rivest [71], hash functions such as SHA-1 [28] andRIPEMD-160 [66], and traditional generators such LFSRs (Linear FeedbackShift Registers) [48] and their more recent counterpart FCSRs (Feedbackwith Carry Shift Registers) [44].

[5328] 5.7 Attacks

[5329] This section describes the various types of attacks that can beundertaken to break an authentication cryptosystem. The attacks aregrouped into physical and logical attacks.

[5330] Logical attacks work on the protocols or algorithms rather thantheir physical implementation, and attempt to do one of three things:

[5331] Bypass the authentication process altogether

[5332] Obtain the secret key by force or deduction, so that any questioncan be answered

[5333] Find enough about the nature of the authenticating questions andanswers in order to, without the key, give the right answer to eachquestion.

[5334] Regardless of the algorithms and protocol used by a securitychip, the circuitry of the authentication part of the chip can comeunder physical attack. Physical attacks come in four main ways, althoughthe form of the attack can vary:

[5335] Bypassing the security chip altogether

[5336] Physical examination of the chip while in operation (destructiveand non-destructive)

[5337] Physical decomposition of chip

[5338] Physical alteration of chip

[5339] The attack styles and the forms they take are detailed below.

[5340] This section does not suggest solutions to these attacks. Itmerely describes each attack type. The examination is restricted to thecontext of an authentication chip (as opposed to some other kind ofsystem, such as Internet authentication) attached to some System.

[5341] 5.7.1 Logical Attacks

[5342] These attacks are those which do not depend on the physicalimplementation of the cryptosystem. They work against the protocols andthe security of the algorithms and random number generators.

[5343] 5.7.1.1 Ciphertext Only Attack

[5344] This is where an attacker has one or more encrypted messages, allencrypted using the same algorithm. The aim of the attacker is to obtainthe plaintext messages from the encrypted messages. Ideally, the key canbe recovered so that all messages in the future can also be recovered.

[5345] 5.7.1.2 Known Plaintext Attack

[5346] This is where an attacker has both the plaintext and theencrypted form of the plaintext. In the case of an authentication chip,a known-plaintext attack is one where the attacker can see the data flowbetween the system and the authentication chIP. The inputs and outputsare observed (not chosen by the attacker), and can be analyzed forweaknesses (such as birthday attacks or by a search for differentiallyinteresting input/output pairs).

[5347] A known plaintext attack can be carried out by connecting a logicanalyzer to the connection between the system and the authenticationchIP.

[5348] 5.7.1.3 Chosen Plaintext Attacks

[5349] A chosen plaintext attack describes one where a cryptanalyst hasthe ability to send any chosen message to the cryptosystem, and observethe response. If the cryptanalyst knows the algorithm, there may be arelationship between inputs and outputs that can be exploited by feedinga specific output to the input of another function.

[5350] The chosen plaintext attack is much stronger than the knownplaintext attack since the attacker can choose the messages rather thansimply observe the data flow.

[5351] On a system using an embedded authentication chip, it isgenerally very difficult to prevent chosen plaintext attacks since thecryptanalyst can logically pretend he/she is the system, and thus sendany chosen bit-pattern streams to the authentication chIP.

[5352] 5.7.1.4 Adaptive Chosen Plaintext Attacks

[5353] This type of attack is similar to the chosen plaintext attacksexcept that the attacker has the added ability to modify subsequentchosen plaintexts based upon the results of previous experiments. Thisis certainly the case with any system/authentication chip scenariodescribed for consumables such as photocopiers and toner cartridges,especially since both systems and consumables are made available to thepublic.

[5354] 5.7.1.5 Brute Force Attack

[5355] A guaranteed way to break any key-based cryptosystem algorithm issimply to try every key.

[5356] Eventually the right one will be found. This is known as a bruteforce attack. However, the more key possibilities there are, the morekeys must be tried, and hence the longer it takes (on average) to findthe right one. If there are N keys, it will take a maximum of N tries.If the key is N bits long, it will take a maximum of 2^(N) tries, with a50% chance of finding the key after only half the attempts (2^(N−1)).The longer N becomes, the longer it will take to find the key, and hencethe more secure the key is. Of course, an attack may guess the key onthe first try, but this is more unlikely the longer the key is.

[5357] Consider a key length of 56 bits. In the worst case, all 2⁵⁶tests (7.2×10¹⁶ tests) must be made to find the key. In 1977, Diffie andHellman described a specialized machine for cracking DES, consisting ofone million processors, each capable of running one million tests persecond [17]. Such a machine would take 20 hours to break any DES code.

[5358] Consider a key length of 128 bits. In the worst case, all 2¹²⁸tests (3.4×10³⁸ tests) must be made to find the key. This would take tenbillion years on an array of a trillion processors each running 1billion tests per second.

[5359] With a long enough key length, a brute force attack takes toolong to be worth the attacker's efforts.

[5360] 5.7.1.6 Guessing Attack

[5361] This type of attack is where an attacker attempts to simply“guess” the key. As an attack it is identical to the brute force attack(see Section 5.7.1.5 on page 647) where the odds of success depend onthe length of the key.

[5362] 5.7.1.7 Quantum Computer Attack

[5363] To break an n-bit key, a quantum computer [83] (NMR, Optical, orCaged Atom) containing n qubits embedded in an appropriate algorithmmust be built. The quantum computer effectively exists in 2^(n)simultaneous coherent states. The trick is to extract the right coherentstate without causing any decoherence. To date this has been achievedwith a 2 qubit system (which exists in 4 coherent states). It is thoughtpossible to extend this to 6 qubits (with 64 simultaneous coherentstates) within a few years.

[5364] Unfortunately, every additional qubit halves the relativestrength of the signal representing the key. This rapidly becomes aserious impediment to key retrieval, especially with the long keys. usedin cryptographically secure systems.

[5365] As a result, attacks on a cryptographically secure key (e.g. 160bits) using a Quantum Computer are likely not to be feasible and it isextremely unlikely that quantum computers will have achieved more than50 or so qubits within the commercial lifetime of the authenticationchips. Even using a 50 qubit quantum computer, 2¹¹⁰ tests are requiredto crack a 160 bit key.

[5366] 5.7.1.8 Purposeful Error Attack

[5367] With certain algorithms, attackers can gather valuableinformation from the results of a bad input. This can range from theerror message text to the time taken for the error to be generated.

[5368] A simple example is that of a userid/password scheme. If theerror message usually says “Bad userid”, then when an attacker gets amessage saying “Bad password” instead, then they know that the userid iscorrect. If the message always says “Bad userid/password” then much lessinformation is given to the attacker. A more complex example is that ofthe recent published method of cracking encryption codes from secure websites [41]. The attack involves sending particular messages to a serverand observing the error message responses. The responses give enoughinformation to learn the keys—even the lack of a response gives someinformation.

[5369] An example of algorithmic time can be seen with an algorithm thatreturns an error as soon as an erroneous bit is detected in the inputmessage. Depending on hardware implementation, it may be a simple methodfor the attacker to time the response and alter each bit one by onedepending on the time taken for the error response, and thus obtain thekey. Certainly in a chip implementation the time taken can be observedwith far greater accuracy than over the Internet.

[5370] 5.7.1.9 Birthday Attack

[5371] This attack is named after the famous “birthday paradox” (whichis not actually a paradox at all). The odds of one person sharing abirthday with another, is 1 in 365 (not counting leap years). Thereforethere must be 183 people in a room for the odds to be more than 50% thatone of them shares your birthday. However, there only needs to be 23people in a room for there to be more than a 50% chance that any twoshare a birthday, as shown in the following relation:${Prob} = {{1 - \frac{nPr}{n^{r}}} = {{1 - \frac{365{P23}}{365^{23}}} \approx 0.507}}$

[5372] Birthday attacks are common attacks against hashing algorithms,especially those algorithms that combine hashing with digitalsignatures.

[5373] If a message has been generated and already signed, an attackermust search for a collision message that hashes to the same value(analogous to finding one person who shares your birthday). However, ifthe attacker can generate the message, the birthday attack comes intoplay. The attacker searches for two messages that share the same hashvalue (analogous to any two people sharing a birthday), only one messageis acceptable to the person signing it, and the other is beneficial forthe attacker. Once the person has signed the original message theattacker simply claims now that the person signed the alternativemessage—mathematically there is no way to tell which message was theoriginal, since they both hash to the same value.

[5374] Assuming a brute force attack is the only way to determine amatch, the weakening of an n-bit key by the birthday attack is 2^(n/2).A key length of 128 bits that is susceptible to the birthday attack hasan effective length of only 64 bits.

[5375] 5.7.1.10 Chaining Attack

[5376] These are attacks made against the chaining nature of hashfunctions. They focus on the compression function of a hash function.The idea is based on the fact that a hash function generally takesarbitrary length input and produces a constant length output byprocessing the input n bits at a time. The output from one block is usedas the chaining variable set into the next block. Rather than finding acollision against an entire input, the idea is that given an inputchaining variable set, to find a substitute block that will result inthe same output chaining variables as the proper message.

[5377] The number of choices for a particular block is based on thelength of the block. If the chaining variable is c bits, the hashingfunction behaves like a random mapping, and the block length is b bits,the number of such b-bit blocks is approximately 2^(b)/2^(c). Thechallenge for finding a substitution block is that such blocks are asparse subset of all possible blocks.

[5378] For SHA-1, the number of 512 bit blocks is approximately2⁵¹²/2¹⁶⁰, or 2³⁵². The chance of finding a block by brute force searchis about 1 in 2¹⁶⁰.

[5379] 5.7.1.11 Substitution with a Complete Lookup Table

[5380] If the number of potential messages sent to the chip is small,then there is no need for a clone manufacturer to crack the key.Instead, the clone manufacturer could incorporate a ROM in their chipthat had a record of all of the responses from a genuine chip to thecodes sent by the system. The larger the key, and the larger theresponse, the more space is required for such a lookup table.

[5381] 5.7.1.12 Substitution with a Sparse Lookup Table

[5382] If the messages sent to the chip are somehow predictable, ratherthan effectively random, then the clone manufacturer need not provide acomplete lookup table. For example:

[5383] If the message is simply a serial number, the clone manufacturerneed simply provide a lookup table that contains values for past andpredicted future serial numbers. There are unlikely to be more than 10⁹of these.

[5384] If the test code is simply the date, then the clone manufacturercan produce a lookup table using the date as the address.

[5385] If the test code is a pseudo-random number using either theserial number or the date as a seed, then the clone manufacturer justneeds to crack the pseudo-random number generator in the system. This isprobably not difficult, as they have access to the object code of thesystem. The clone manufacturer would then produce a content addressablememory (or other sparse array lookup) using these codes to access storedauthentication codes.

[5386] 5.7.1.13 Differential Cryptanalysis

[5387] Differential cryptanalysis describes an attack where pairs ofinput streams are generated with known differences, and the differencesin the encoded streams are analyzed.

[5388] Existing differential attacks are heavily dependent on thestructure of S boxes, as used in DES and other similar algorithms.Although other algorithms such as HMAC-SHA1 have no S boxes, an attackercan undertake a differential-like attack by undertaking statisticalanalysis of:

[5389] Minimal-difference inputs, and their corresponding outputs

[5390] Minimal-difference outputs, and their corresponding inputs

[5391] Most algorithms were strengthened against differentialcryptanalysis once the process was described. This is covered in thespecific sections devoted to each cryptographic algorithm. However somerecent algorithms developed in secret have been broken because thedevelopers had not considered certain styles of differential attacks[94] and did not subject their algorithms to public scrutiny.

[5392] 5.7.1.14 Message Substitution Attacks

[5393] In certain protocols, a man-in-the-middle can substitute part orall of a message. This is where a real authentication chip is pluggedinto a reusable clone chip within the consumable. The clone chipintercepts all messages between the system and the authentication chip,and can perform a number of substitution attacks.

[5394] Consider a message containing a header followed by content. Anattacker may not be able to generate a valid header, but may be able tosubstitute their own content, especially if the valid response issomething along the lines of “Yes, I received your message”. Even if thereturn message is “Yes, I received the following message . . . ”, theattacker may be able to substitute the original message before sendingthe acknowledgment back to the original sender.

[5395] Message Authentication Codes were developed to combat messagesubstitution attacks.

[5396] 5.7.1.15 Reverse Engineering the Key Generator

[5397] If a pseudo-random number generator is used to generate keys,there is the potential for a clone manufacture to obtain the generatorprogram or to deduce the random seed used. This was the way in which thesecurity layer of the Netscape browser program was initially broken[33].

[5398] 5.7.1.16 Bypassing the Authentication Process

[5399] It may be that there are problems in the authentication protocolsthat can allow a bypass of the authentication process altogether. Withthese kinds of attacks the key is completely irrelevant, and theattacker has no need to recover it or deduce it.

[5400] Consider an example of a system that authenticates at power-up,but does not authenticate at any other time. A reusable consumable witha clone authentication chip may make use of a real authentication chIP.The clone authentication chip uses the real chip for the authenticationcall, and then simulates the real authentication chip's state data afterthat.

[5401] Another example of bypassing authentication is if the systemauthenticates only after the consumable has been used. A cloneauthentication chip can accomplish a simple authentication bypass bysimulating a loss of connection after the use of the consumable butbefore the authentication protocol has completed (or even started).

[5402] One infamous attack known as the “Kentucky Fried Chip” hack [2]involved replacing a microcontroller chip for a satellite TV system.When a subscriber stopped paying the subscription fee, the system wouldsend out a “disable” message. However the new micro-controller wouldsimply detect this message and not pass it on to the consumer'ssatellite TV system.

[5403] 5.7.1.17 Garrote/Bribe Attack

[5404] If people know the key, there is the possibility that they couldtell someone else. The telling may be due to coercion (bribe, garroteetc.), revenge (e.g. a disgruntled employee), or simply for principle.These attacks are usually cheaper and easier than other efforts atdeducing the key. As an example, a number of people claiming to beinvolved with the development of the (now defunct) Divx standard for DVDclaimed (before the standard was rejected by consumers) that they wouldlike to help develop Divx specific cracking devices—out of principle.

[5405] 5.7.2 Physical Attacks

[5406] The following attacks assume implementation of an authenticationmechanism in a silicon chip that the attacker has physical access to.The first attack, Reading ROM, describes an attack when keys are storedin ROM, while the remaining attacks assume that a secret key is storedin Flash memory.

[5407] 5.7.2.1 Reading ROM

[5408] If a key is stored in ROM it can be read directly. A ROM can thusbe safely used to hold a public key (for use in asymmetriccryptography), but not to hold a private key. In symmetric cryptography,a ROM is completely insecure. Using a copyright text (such as a haiku)as the key is not sufficient, because we are assuming that the cloningof the chip is occurring in a country where intellectual property is notrespected.

[5409] 5.7.2.2 Reverse Engineering of Chip

[5410] Reverse engineering of the chip is where an attacker opens thechip and analyzes the circuitry. Once the circuitry has been analyzedthe inner workings of the chip's algorithm can be recovered. LucentTechnologies have developed an active method [4] known as TOBIC (Twophoton OBIC, where OBIC stands for Optical Beam Induced Current), toimage circuits. Developed primarily for static RAM analysis, the processinvolves removing any back materials, polishing the back surface to amirror finish, and then focusing light on the surface. The excitationwavelength is specifically chosen not to induce a current in the IC.

[5411] A Kerckhoffs in the nineteenth century made a fundamentalassumption about cryptanalysis: if the algorithm's inner workings arethe sole secret of the scheme, the scheme is as good as broken [39]. Hestipulated that the secrecy must reside entirely in the key. As aresult, the best way to protect against reverse engineering of the chipis to make the inner workings irrelevant.

[5412] 5.7.2.3 Usurping the Authentication Process

[5413] It must be assumed that any clone manufacturer has access to boththe system and consumable designs.

[5414] If the same channel is used for communication between the systemand a trusted system authentication chip, and a non-trusted consumableauthentication chip, it may be possible for the non-trusted chip tointerrogate a trusted authentication chip in order to obtain the“correct answer”. If this is so, a clone manufacturer would not have todetermine the key. They would only have to trick the system into usingthe responses from the system authentication chIP.

[5415] The alternative method of usurping the authentication processfollows the same method as the logical attack described in Section5.7.1.16 on page 652, involving simulated loss of contact with thesystem whenever authentication processes take place, simulatingpower-down etc.

[5416] 5.7.2.4 Modification of System

[5417] This kind of attack is where the system itself is modified toaccept clone consumables. The attack may be a change of system ROM, arewiring of the consumable, or, taken to the extreme case, a completelyclone system.

[5418] Note that this kind of attack requires each individual system tobe modified, and would most likely require the owner's consent. Therewould usually have to be a clear advantage for the consumer to undertakesuch a modification, since it would typically void warranty and wouldmost likely be costly. An example of such a modification with a clearadvantage to the consumer is a software patch to change fixed-region DVDplayers into region-free DVD players (although it should be noted thatthis is not to use clone consumables, but rather originals from the samecompanies simply targeted for sale in other countries).

[5419] 5.7.2.5 Direct Viewing of Chip Operation by Conventional Probing

[5420] If chip operation could be directly viewed using an STM (ScanningTunnelling Microscope) or an electron beam, the keys could be recordedas they are read from the internal non-volatile memory and loaded intowork registers.

[5421] These forms of conventional probing require direct access to thetop or front sides of the IC while it is powered.

[5422] 5.7.2.6 Direct Viewing of the Non-Volatile Memory

[5423] If the chip were sliced so that the floating gates of the Flashmemory were exposed, without discharging them, then the key couldprobably be viewed directly using an STM or SKM (Scanning KelvinMicroscope).

[5424] However, slicing the chip to this level without discharging thegates is probably impossible. Using wet etching, plasma etching, ionmilling (focused ion beam etching), or chemical mechanical polishingwill almost certainly discharge the small charges present on thefloating gates.

[5425] 5.7.2.7 Viewing the Light Bursts Caused by State Changes

[5426] Whenever a gate changes state, a small amount of infrared energyis emitted. Since silicon is transparent to infrared, these changes canbe observed by looking at the circuitry from the underside of a chIP.While the emission process is weak, it is bright enough to be detectedby highly sensitive equipment developed for use in astronomy. Thetechnique [92], developed by IBM, is called PICA (Picosecond ImagingCircuit Analyzer). If the state of a register is known at time t, thenwatching that register change over time will reveal the exact value attime t+n, and if the data is part of the key, then that part iscompromised.

[5427] 5.7.2.8 Viewing the Keys Using an SEPM

[5428] A non-invasive testing device, known as a Scanning ElectricPotential Microscope (SEPM), allows the direct viewing of charges withina chip [37]. The SEPM has a tungsten probe that is placed a fewmicrometers above the chip, with the probe and circuit forming acapacitor. Any AC signal flowing beneath the probe causes displacementcurrent to flow through this capacitor. Since the value of the currentchange depends on the amplitude and phase of the AC signal, the signalcan be imaged. If the signal is part of the key, then that part iscompromised.

[5429] 5.7.2.9 Monitoring EMI

[5430] Whenever electronic circuitry operates, faint electromagneticsignals are given off. Relatively inexpensive equipment can monitorthese signals and could give enough information to allow an attacker todeduce the keys.

[5431] 5.7.2.10 Viewing I_(dd) Fluctuations

[5432] Even if keys cannot be viewed, there is a fluctuation in currentwhenever registers change state. If there is a high enough signal tonoise ratio, an attacker can monitor the difference in I_(dd) that mayoccur when programming over either a high or a low bit. The change inI_(dd) can reveal information about the key. Attacks such as these havealready been used to break smart cards [46].

[5433] 5.7.2.11 Differential Fault Analysis

[5434] This attack assumes introduction of a bit error by ionization,microwave radiation, or environmental stress. In most cases such anerror is more likely to adversely affect the chip (e.g. cause theprogram code to crash) rather than cause beneficial changes which wouldreveal the key. Targeted faults such as ROM overwrite, gate destructionetc. are far more likely to produce useful results.

[5435] 5.7.2.12 Clock Glitch Attacks

[5436] Chips are typically designed to properly operate within a certainclock speed range. Some attackers attempt to introduce faults in logicby running the chip at extremely high clock speeds or introduce a clockglitch at a particular time for a particular duration [1]. The idea isto create race conditions where the circuitry does not functionproperly. An example could be an AND gate that (because of raceconditions) gates through Input₁ all the time instead of the AND ofInput, and Input₂.

[5437] If an attacker knows the internal structure of the chip, they canattempt to introduce race conditions at the correct moment in thealgorithm execution, thereby revealing information about the key (or inthe worst case, the key itself).

[5438] 5.7.2.13 Power Supply Attacks

[5439] Instead of creating a glitch in the clock signal, attackers canalso produce glitches in the power supply where the power is increasedor decreased to be outside the working operating voltage range. The neteffect is the same as a clock glitch—introduction of error in theexecution of a particular instruction. The idea is to stop the CPU fromXORing the key, or from shifting the data one bit-position etc. Specificinstructions are targeted so that information about the key is revealed.

[5440] 5.7.2.14 Overwriting ROM

[5441] Single bits in a ROM can be overwritten using a laser cuttermicroscope [1], to either 1 or 0 depending on the sense of the logic. Ifthe ROM contains instructions, it may be a simple matter for an attackerto change a conditional jump to a non-conditional jump, or perhapschange the destination of a register transfer. If the target instructionis chosen carefully, it may result in the key being revealed.

[5442] 5.7.2.15 Modifying EEPROM/Flash

[5443] These attacks fall into two categories:

[5444] those similar to the ROM attacks except that the laser cuttermicroscope technique can be used to both set and reset individual bits.This gives much greater scope in terms of modification of algorithms.

[5445] Electron beam programming of floating gates. As described in [89]and [32], a focused electron beam can change a gate by depositingelectrons onto it. Damage to the rest of the circuit can be avoided, asdescribed in [31].

[5446] 5.7.2.16 Gate Destruction

[5447] Anderson and Kuhn described the rump session of the 1997 workshopon Fast Software Encryption (1], where Biham and Shamir presented anattack on DES. The attack was to use a laser cutter to destroy anindividual gate in the hardware implementation of a known block cipher(DES). The net effect of the attack was to force a particular bit of aregister to be “stuck”. Biham and Shamir described the effect of forcinga particular register to be affected in this way—the least significantbit of the output from the round function is set to 0. Comparing the 6least significant bits of the left half and the right half can recoverseveral bits of the key. Damaging a number of chips in this way canreveal enough information about the key to make complete key recoveryeasy.

[5448] An encryption chip modified in this way will have the propertythat encryption and decryption will no longer be inverses.

[5449] 5.7.2.17 Overwrite Attacks

[5450] Instead of trying to read the Flash memory, an attacker maysimply set a single bit by use of a laser cutter microscope. Althoughthe attacker doesn't know the previous value, they know the new value.If the chip still works, the bit's original state must be the same asthe new state. If the chip doesn't work any longer, the bit's originalstate must be the logical NOT of the current state. An attacker canperform this attack on each bit of the key and obtain the n-bit keyusing at most n chips (if the new bit matched the old bit, a new chip isnot required for determining the next bit).

[5451] 5.7.2.18 Test Circuitry Attack

[5452] Most chips contain test circuitry specifically designed to checkfor manufacturing defects. This includes BIST (Built In Self Test) andscan paths. Quite often the scan paths and test circuitry includesaccess and readout mechanisms for all the embedded latches. In somecases the test circuitry could potentially be used to give informationabout the contents of particular registers.

[5453] Test circuitry is often disabled once the chip has passed allmanufacturing tests, in some cases by blowing a specific connectionwithin the chIP. A determined attacker, however, can reconnect the testcircuitry and hence enable it.

[5454] 5.7.2.19 Memory Remnants

[5455] Values remain in RAM long after the power has been removed [35],although they do not remain long enough to be considered non-volatile.An attacker can remove power once sensitive information has been movedinto RAM (for example working registers), and then attempt to read thevalue from RAM. This attack is most useful against security systems thathave regular RAM chips. A classic example is cited by [1], where asecurity system was designed with an automatic power-shut-off that istriggered when the computer case is opened. The attacker was able tosimply open the case, remove the RAM chips, and retrieve the key becausethe values persisted.

[5456] 5.7.2.20 Chip Theft Attack

[5457] If there are a number of stages in the lifetime of anauthentication chip, each of these stages must be examined in terms oframifications for security should chips be stolen. For example, ifinformation is programmed into the chip in stages, theft of a chipbetween stages may allow an attacker to have access to key informationor reduced efforts for attack. Similarly, if a chip is stolen directlyafter manufacture but before programming, does it give an attacker anylogical or physical advantage?

[5458] 5.7.2.21 Trojan Horse Attack

[5459] At some stage the authentication chips must be programmed with asecret key. Suppose an attacker builds a clone authentication chip andadds it to the pile of chips to be programmed. The attacker hasespecially built the clone chip so that it looks and behaves just like areal authentication chip, but will give the key out to the attacker whena special attacker-known command is issued to the chIP. Of course theattacker must have access to the chip after the programming has takenplace, as well as physical access to add the Trojan horse authenticationchip to the genuine chips.

[5460] 6 Requirements

[5461] Existing solutions to the problem of authenticating consumableshave typically relied on patents covering physical packaging. Howeverthis does not stop home refill operations or clone manufacture incountries with weak industrial property protection. Consequently a muchhigher level of protection is required.

[5462] The authentication mechanism is therefore built into anauthentication chip that is embedded in the consumable and allows asystem to authenticate that consumable securely and easily. Limitingourselves to the system authenticating consumables (we don't considerthe consumable authenticating the system), two levels of protection canbe considered:

[5463] Presence Only Authentication:

[5464] This is where only the presence of an authentication chip istested. The authentication chip can be removed and used in otherconsumables as long as be used indefinitely.

[5465] Consumable Lifetime Authentication:

[5466] This is where not only is the presence of the authentication chiptested for, but also the authentication chip must only last the lifetimeof the consumable. For the chip to be re-used it must be completelyerased and reprogrammed.

[5467] The two levels of protection address different requirements. Weare primarily concerned with Consumable Lifetime authentication in orderto prevent cloned versions of high volume consumables. In this case,each chip should hold secure state information about the consumablebeing authenticated. It should be noted that a Consumable Lifetimeauthentication chip could be used in any situation requiring a PresenceOnly authentication chIP.

[5468] Requirements for authentication, data storage integrity andmanufacture are considered separately. The following sections summarizerequirements of each.

[5469] 6.1 Authentication

[5470] The authentication requirements for both Presence Only andConsumable Lifetime authentication are restricted to the case of asystem authenticating a consumable. We do not consider bi-directionalauthentication where the consumable also authenticates the system. Forexample, it is not necessary for a valid toner cartridge to ensure it isbeing used in a valid photocopier.

[5471] For Presence Only authentication, we must be assured that anauthentication chip is physically present. For Consumable Lifetimeauthentication we also need to be assured that state data actually camefrom the authentication chip, and that it has not been altered en route.These issues cannot be separated—data that has been altered has a newsource, and if the source cannot be determined, the question ofalteration cannot be settled.

[5472] It is not enough to provide an authentication method that issecret, relying on a home-brew security method that has not beenscrutinized by security experts. The primary requirement therefore is toprovide authentication by means that have withstood the scrutiny ofexperts.

[5473] The authentication scheme used by the authentication chip shouldbe resistant to defeat by logical means. Logical types of attack areextensive, and attempt to do one of three things:

[5474] Bypass the authentication process altogether

[5475] Obtain the secret key by force or deduction, so that any questioncan be answered

[5476] Find enough about the nature of the authenticating questions andanswers in order to, without the key, give the right answer to eachquestion.

[5477] The logical attack styles and the forms they take are detailed inSection 5.7.1 on page 646.

[5478] The algorithm should have a flat keyspace, allowing any randombit string of the required length to be a possible key. There should beno weak keys.

[5479] 6.2 Data Storage Integrity

[5480] Although authentication protocols take care of ensuring dataintegrity in communicated messages, data storage integrity is alsorequired. Two kinds of data must be stored within the authenticationchip:

[5481] Authentication data, such as secret keys

[5482] Consumable state data, such as serial numbers, and mediaremaining etc.

[5483] The access requirements of these two data types differ greatly.The authentication chip therefore requires a storage/access controlmechanism that allows for the integrity requirements of each type.

[5484] 6.2.1 Authentication Data

[5485] Authentication data must remain confidential. It needs to bestored in the chip during a manufacturing/programming stage of thechip's life, but from then on must not be permitted to leave the chIP.It must be resistant to being read from non-volatile memory. Theauthentication scheme is responsible for ensuring the key cannot beobtained by deduction, and the manufacturing process is responsible forensuring that the key cannot be obtained by physical means.

[5486] The size of the authentication data memory area must be largeenough to hold the necessary keys and secret information as mandated bythe authentication protocols.

[5487] 6.2.2 Consumable State Data

[5488] Consumable state data can be divided into the following types.Depending on the application, there will be different numbers of each ofthese types of data items.

[5489] Read Only

[5490] ReadWrite

[5491] Decrement Only

[5492] Read Only data needs to be stored in the chip during amanufacturing/programming stage of the chip's life, but from then onshould not be allowed to change. Examples of Read Only data items areconsumable batch numbers and serial numbers.

[5493] ReadWrite data is changeable state information, for example, thelast time the particular consumable was used. ReadWrite data items canbe read and written an unlimited number of times during the lifetime ofthe consumable. They can be used to store any state information aboutthe consumable. The only requirement for this data is that it needs tobe kept in non-volatile memory. Since an attacker can obtain access to asystem (which can write to ReadWrite data), any attacker can potentiallychange data fields of this type. This data type should not be used forsecret information, and must be considered insecure.

[5494] Decrement Only data is used to count down the availability ofconsumable resources. A photocopier's toner cartridge, for example, maystore the amount of toner remaining as a Decrement Only data item. Anink cartridge for a color printer may store the amount of each ink coloras a Decrement Only data item, requiring 3 (one for each of Cyan,Magenta, and Yellow), or even as many as 5 or 6 Decrement Only dataitems. The requirement for this kind of data item is that onceprogrammed with an initial value at the manufacturing/programming stage,it can only reduce in value. Once it reaches the minimum value, itcannot decrement any further. The Decrement Only data item is onlyrequired by Consumable Lifetime authentication.

[5495] Note that the size of the consumable state data storage requiredis only for that information required to be authenticated. Informationwhich would be of no use to an attacker, such as ink color-curvecharacteristics or ink viscosity do not have to be stored in the securestate data memory area of the authentication chIP.

[5496] 6.3 Manufacture

[5497] The authentication chip must have a low manufacturing cost inorder to be included as the authentication mechanism for low costconsumables.

[5498] The authentication chip should use a standard manufacturingprocess, such as Flash. This is necessary to:

[5499] Allow a great range of manufacturing location options

[5500] Use well-defined and well-behaved technology

[5501] Reduce cost

[5502] Regardless of the authentication scheme used, the circuitry ofthe authentication part of the chip must be resistant to physicalattack. Physical attack comes in four main ways, although the form ofthe attack can vary:

[5503] Bypassing the authentication chip altogether

[5504] Physical examination of chip while in operation (destructive andnon-destructive)

[5505] Physical decomposition of chip

[5506] Physical alteration of chip

[5507] The physical attack styles and the forms they take are detailedin Section 5.7.2 on page 652. Ideally, the chip should be exportablefrom the USA, so it should not be possible to use an authentication chipas a secure encryption device. This is low priority requirement sincethere are many companies in other countries able to manufacture theauthentication chips. In any case, the export restrictions from the USAmay change.

[5508] Authentication

[5509] 7 Introduction

[5510] Existing solutions to the problem of authenticating consumableshave typically relied on physical patents on packaging. However thisdoes not stop home refill operations or clone manufacture in countrieswith weak industrial property protection. Consequently a much higherlevel of protection is required.

[5511] It is not enough to provide an authentication method that issecret, relying on a home-brew security method that has not beenscrutinized by security experts. Security systems such as Netscape'soriginal proprietary system and the GSM Fraud Prevention Network used bycellular phones are examples where design secrecy caused thevulnerability of the security [33][33]. Both security systems werebroken by conventional means that would have been detected if thecompanies had followed an open design process. The solution is toprovide authentication by means that have withstood the scrutiny ofexperts.

[5512] In this section, we examine a number of protocols that can beused for consumables authentication. We only use security methods thatare publicly described, using known behaviors in this new way. Readersshould be familiar with the concepts and terms described in Section 5 onpage 629. We avoid the Zero Knowledge Proof protocol since it ispatented.

[5513] For all protocols, the security of the scheme relies on a secretkey, not a secret algorithm. In the nineteenth century, A Kerckhoffsmade a fundamental assumption about cryptanalysis: if the algorithm'sinner workings are the sole secret of the scheme, the scheme is as goodas broken [39]. He stipulated that the secrecy must reside entirely inthe key. As a result, the best way to protect against reverseengineering of any authentication chip is to make the algorithmic innerworkings irrelevant (the algorithm of the inner workings must still bemust be valID, but not the actual secret).

[5514] The QA Chip is a programmable device, and can therefore be setupwith an application-specific program together with anapplication-specific set of protocols. This section describes thefollowing sets of protocols:

[5515] single key single memory vector

[5516] multiple key single memory vector

[5517] multiple key multiple memory vector

[5518] These protocols refer to the number of valid keys that an QA Chipknows about, and the size of data required to be stored in the chIP.

[5519] From these protocols it is straightforward to construct protocolsets for the single key multiple memory vector case (of course themultiple memory vector can be considered to be. and multiple key singlememory vector. Other protocol sets can also be defined as necessary. Ofcourse multiple memory vector can be conveniently

[5520] All the protocols rely on a time-variant challenge (i.e. thechallenge is different each time), where the response depends on thechallenge and the secret. The challenge involves a random number so thatany observer will not be able to gather useful information about asubsequent identification.

[5521] 8 Single Key Single Memory Vector

[5522] 8.1 Protocol Background

[5523] This protocol set is provided for two reasons:

[5524] the other protocol sets defined in this document are simplyextensions of this one; and

[5525] it is useful in its own right

[5526] The single key protocol set is useful for applications where onlya single key is required. Note that there can be many consumables andsystems, but there is only a single key that connects them all. Examplesinclude:

[5527] car and keys. A car and the car-key share a single key. There canbe multiple sets of car-keys, each effectively cut to the same key. Acompany could have a set of cars, each with the same key. Any of thecar-keys could then be used to drive any of the cars.

[5528] printer and ink cartridge. All printers of a certain model usethe same ink cartridge, with printer and cartridge sharing only a singlekey. Note that to introduce a new printer model that accepts the old inkcartridge the new model would need the same key as the old model. Seethe multiple-key protocols for alternative solutions to this problem.

[5529] 8.2 Requirements of Protocol

[5530] Each QA Chip contains the following values:

[5531] K The secret key for calculating F_(K)[X]. K must not be storeddirectly in the QA ChIP. Instead, each chip needs to store a randomnumber R_(K) (different for each chIP), K⊕R_(K), and

K⊕R_(K). The stored K⊕R_(K) can be XORed with R_(K) to obtain the realK. Although

KβR_(K) must be stored to protect against differential attacks, it isnot used.

[5532] R Current random number used to ensure time varying messages.Each chip instance must be seeded with a different initial value.Changes for each signature generation.

[5533] M Memory vector of QA ChIP.

[5534] P 2 element array of access permissions for each part of M. Entry0 holds access permissions for non-authenticated writes to M (no keyrequired). Entry 1 holds access permissions for authenticated writes toM (key required). Permission choices for each part of M are Read Only,Read/Write, and Decrement Only.

[5535] C 3 constants used for generating signatures. C₁, C₂, and C₃ areconstants that pad out a submessage to a hashing boundary, and all 3must be different.

[5536] Each QA Chip contains the following private function:

[5537] S_(K)[X] Internal function only. Returns S_(K)[X], the result ofapplying a digital signature function S to X based upon key K. Thedigital signature must be long enough to counter the chances of someonegenerating a random signature. The length depends on the signaturescheme chosen, although the scheme chosen for the QA Chip is HMAC-SHA1(see Section 13 on page 691), and therefore the length of the signatureis 160 bits.

[5538] Additional functions are required in certain QA Chips, but theseare described as required.

[5539] 8.3 Reads of M

[5540] In this case, we have a trusted chip (ChipT) connected to aSystem. The System wants to authenticate an object that contains anon-trusted chip (ChipA). In effect, the System wants to know that itcan securely read a memory vector (M) from ChipA: to be sure that ChipAis valid and that M has not been altered.

[5541] The protocol requires the following publicly available functionin ChipA:

[5542] Read[X] Advances R, and returns R, M, S_(K)[X|R|C₁|M]. The timetaken to calculate the signature must not be based on the contents of X,R, M, or K.

[5543] The protocol requires the following publicly available functionsin ChipT:

[5544] Random

Returns R (does not advance R).

[5545] Test[)(, Y, Z] Advances R and returns 1 if S_(K)[R|X|C₁|Y]=Z.Otherwise returns 0. The time taken to calculate and compare signaturesmust be independent of data content.

[5546] To authenticate ChipA and read ChipA's memory M:

[5547] a. System calls ChipT's Random function;

[5548] b. ChipT returns R_(T) to System;

[5549] c. System calls ChipA's Read function, passing in the result fromb;

[5550] d. ChipA updates R_(A), then calculates and returns R_(A), M_(A),S_(K)[R_(T)|R_(A)|C₁|M_(A)];

[5551] e. System calls ChipT's Test function, passing in R_(A), M_(A),S_(K)[R_(T)|R_(A)|C₁|M_(A)];

[5552] f. System checks response from ChipT. If the response is 1, thenChipA is considered authentic. If 0, ChipA is considered invalid.

[5553] The data flow for read authentication is shown in FIG. 334.

[5554] The protocol allows System to simply pass data from one chip toanother, with no special processing. The protection relies on ChipTbeing trusted, even though System does not know K.

[5555] When ChipT is physically separate from System (eg is chip on aboard connected to System) System must also occassionally (based onsystem clock for example) call ChipT's Test function with bad data,expecting a 0 response. This is to prevent someone from inserting a fakeChipT into the system that always returns 1 for the Test function.

[5556] 8.4 Writes

[5557] In this case, the System wants to update M in some chip referredto as ChipU. This can be non-authenticated (for example, anyone isallowed to count down the amount of consumable remaining), orauthenticated (for example, replenishing the amount of consumableremaining).

[5558] 8.4.1 Non-Authenticated Writes

[5559] This is the most frequent type of write, and takes place betweenthe System/consumable during normal everyday operation. In this kind ofwrite, System wants to change M in a way that doesn't require specialauthorization. For example, the System could be decrementing the amountof consumable remaining. Although System does not need to know K or evenhave access to a trusted chip, System must follow a non-authenticatedwrite by an authenticated read if it needs to know that the write wassuccessful.

[5560] The protocol requires the following publicly available function:

[5561] Write[X] Writes X over those parts of M subject to P₀ and theexisting value for M.

[5562] To authenticate a write of M_(new) to ChipA's memory M:

[5563] a. System calls ChipU's Write function, passing in M_(new);

[5564] b. The authentication procedure for a Read is carried out (seeSection 8.3 on page 664);

[5565] c. If ChipU is authentic and M_(new)=M returned in b, the writesucceeded. If not, it failed.

[5566] 8.4.2 Authenticated Writes

[5567] In this kind of write, System wants to change Chip U's M in anauthorized way, without being subject to the permissions that applyduring normal operation (P₀). For example, the consumable may be at arefilling station and the normally Decrement Only section of M should beupdated to include the new valid consumable. In this case, the chipwhose M is being updated must authenticate the writes being generated bythe external System and in addition, apply permissions P₁ to ensure thatonly the correct parts of M are updated.

[5568] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[5569] The protocol requires the following publicly available functionsin ChipU:

[5570] Read[X] Advances R, and returns R, M, S_(K)[X|R|C₁|M]. The timetaken to calculate the signature must be identical for all inputs.

[5571] WriteA[X, Y, Z]Returns 1, advances R, and replaces M by Y subjectto P₁ only if S_(K)[R|X|C₁|Y]=Z. Otherwise returns 0. The time taken tocalculate and compare signatures must be independent of data content.This function is identical to ChipT's Test function except that itadditionally writes Y over those parts of M subject to P₁ when thesignature matches.

[5572] Authenticated writes require that the System has access to aChipS that is capable of generating appropriate signatures. ChipSrequires the following variables and function:

[5573] CountRemaining Part of M that contains the number of signaturesthat ChipS is allowed to generate. Decrements with each successful callto SignM and SignP. Permissions in ChipS's P₀ for this part of M needsto be ReadOnly once ChipS has been setup. Therefore CountRemaining canonly be updated by another ChipS that will perform updates to that partof M (assuming ChipS's P₁ allows that part of M to be updated).

[5574] Q Part of M that contains the write permissions for updatingChipU's M. By adding Q to ChipS we allow different ChipSs that canupdate different parts of M_(U). Permissions in ChipS's P₀ for this partof M needs to be ReadOnly once ChipS has been setup. Therefore Q canonly be updated by another ChipS that will perform updates to that partof M.

[5575] SignM[V,W,X,Y,Z] Advances R, decrements CountRemaining andreturns R, Z_(QX) (Z applied to X with permissions Q), followed byS_(K)[W|R|C₁|Z_(QX)] only if S_(K)[V|W|C₁|X]=Y and CountRemaining >0.Otherwise returns all 0s. The time taken to calculate and comparesignatures must be independent of data content.

[5576] To update ChipU's M vector:

[5577] a. System calls ChipU's Read function, passing in 0 as the inputparameter;

[5578] b. ChipU produces R_(U), M_(U), S_(K)[0|R_(U)|C₁|M_(U)] andreturns these to System;

[5579] c. System calls ChipS's SignM function, passing in 0 (as used ina), R_(U), M_(U), S_(K)[0|R_(U)|C₁|M_(U)], and M_(D) (the desired vectorto be written to ChipU);

[5580] d. ChipS produces R_(S), M_(QD) (processed by running MD againstM_(U) using Q) and S_(K)[R_(U)|R_(S)|C₁|M_(QD)] if the inputs werevalID, and 0 for all outputs if the inputs were not valid.

[5581] e. If values returned in d are non zero, then ChipU is consideredauthentic. System can then call ChipU's WriteA function with thesevalues.

[5582] f. ChipU should return a 1 to indicate success. A 0 should onlybe returned if the data generated by ChipS is incorrect (e.g. atransmission error).

[5583] The data flow for authenticated writes is shown in FIG. 335.

[5584] Note that Q in ChipS is part of ChipS's M. This allows a user toset up ChipS with a permission set for upgrades. This should be done toChipS and that part of M designated by P₀ set to ReadOnly before ChipSis programmed with K_(U). If K_(S) is programmed with K_(U) first, thereis a risk of someone obtaining a half-setup ChipS and changing all ofM_(U) instead of only the sections specified by Q.

[5585] The same is true of CountRemaining. The CountRemaining valueneeds to be setup (including making it ReadOnly in P₀) before ChipS isprogrammed with K_(U). ChipS is therefore programmed to only perform alimited number of SignM operations (thereby limiting compromise exposureif a ChipS is stolen). Thus ChipS would itself need to be upgraded witha new CountRemaining every so often.

[5586] 8.4.3 Updating Permissions for Future Writes

[5587] In order to reduce exposure to accidental and malicious attackson P and certain parts of M, only authorized users are allowed to updateP. Writes to P are the same as authorized writes to M, except that theyupdate P_(n) instead of M. Initially (at manufacture), P is set to beRead/Write for all parts of M. As different processes fill up differentparts of M, they can be sealed against future change by updating thepermissions. Updating a chip's P₀ changes permissions for unauthorizedwrites, and updating P₁ changes permissions for authorized writes.

[5588] P_(n) is only allowed to change to be a more restrictive form ofitself. For example, initially all parts of M have permissions ofRead/Write. A permission of Read/Write can be updated to Decrement Onlyor Read Only. A permission of Decrement Only can be updated to becomeRead Only. A Read Only permission cannot be further restricted.

[5589] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[5590] The protocol requires the following publicly available functionsin ChipU:

[5591] Random

Returns R (does not advance R).

[5592] SetPermission[n,X,Y,Z] Advances R, and updates P_(n) according toY and returns 1 followed by the resultant P_(n) only ifS_(K)[R|X|Y|C₂]=Z. Otherwise returns 0. P_(n) can only become morerestricted. Passing in 0 for any permission leaves it unchanged (passingin Y=0 returns the current P_(n)).

[5593] Authenticated writes of permissions require that the System hasaccess to a ChipS that is capable of generating appropriate signatures.ChipS requires the following variables and function:

[5594] CountRemaining Part of M that contains the number of signaturesthat ChipS is allowed to generate. Decrements with each successful callto SignM and SignP. Permissions in ChipS's P₀ for this part of M needsto be ReadOnly once ChipS has been setup. Therefore CountRemaining canonly be updated by another ChipS that will perform updates to that partof M (assuming ChipS's P₁ allows that part of M to be updated).

[5595] SignP[X,Y] Advances R, decrements CountRemaining and returns Rand S_(K)[X|R|Y|C₂] only if CountRemaining>0. Otherwise returns all 0s.The time taken to calculate and compare signatures must be independentof data content.

[5596] To update ChipU's P_(n):

[5597] a. System calls ChipU's Random function;

[5598] b. ChipU returns R_(U) to System;

[5599] c. System calls ChipS's SignP function, passing in R_(U) andP_(D) (the desired P to be written to ChipU);

[5600] d. ChipS produces R_(S) and S_(K)[R_(U)|R_(S)|P_(D)|C₂] if it isstill permitted to produce signatures.

[5601] e. If values returned in d are non zero, then System can thencall ChipU's SetPermission function with the desired n, R_(S), P_(D) andS_(K)[R_(U)|R_(S)|P_(D)|C₂].

[5602] f. ChipU verifies the received signature againstS_(K)[R_(U)|R_(S)|P_(D)|C₂] and applies P_(D) to P_(n) if the signaturematches

[5603] g. System checks 1st output parameter. 1=success, 0=failure.

[5604] The data flow for authenticated writes to permissions is shown inFIG. 336 below.

[5605] 8.5 Programming K

[5606] In this case, we have a factory chip (ChipF) connected to aSystem. The System wants to program the key in another chip (ChipP).System wants to avoid passing the new key to ChipP in the clear, andalso wants to avoid the possibility of the key-upgrade message beingreplayed on another ChipP (even if the user doesn't know the key).

[5607] The protocol assumes that ChipF and ChipP already share a secretkey K_(old). This key is used to ensure that only a chip that knowsK_(old) can set K_(new).

[5608] The protocol requires the following publicly available functionsin ChipP:

[5609] Random

Returns R (does not advance R).

[5610] ReplaceKey[X, Y, Z] Replaces K by S_(Kold)[R|X|C₃]⊕Y, advances R,and returns 1 only if S_(Kold)[X|Y|C₃]=Z. Otherwise returns 0. The timetaken to calculate signatures and compare values must be identical forall inputs.

[5611] And the following data and function in ChipF:

[5612] CountRemaining Part of M with contains the number of signaturesthat ChipF is allowed to generate. Decrements with each successful callto GetProgramKey. Permissions in P for this part of M needs to beReadOnly once ChipF has been setup. Therefore can only be updated by aChipS that has authority to perform updates to that part of M.

[5613] K_(new) The new key to be transferred from ChipF to ChipP. Mustnot be visible.

[5614] SetPartialKey[X,Y] If word X of K_(new) has not yet been set, setword X of K_(new) to Y and return 1. Otherwise return 0. This functionallows K_(new) to be programmed in multiple steps, thereby allowingdifferent people or systems to know different parts of the key (but notthe whole K_(new)). K_(new) is stored in ChipF's flash memory. Sincethere is a small number of ChipFs, it is theoretically not necessary tostore the inverse of K_(new), but it is stronger protection to do so.

[5615] GetProgramKey[X] Advances R_(F), decrements CountRemaining,outputs R_(F), the encrypted key S_(Kold)[X|R_(F)|C₃]⊕K_(new) and asignature of the first two outputs plus C₃ if CountRemaining>0.Otherwise outputs 0. The time to calculate the encrypted key & signaturemust be identical for all inputs.

[5616] To update P's key:

[5617] a. System calls ChipP's Random function;

[5618] b. ChipP returns R_(P) to System;

[5619] c. System calls ChipF's GetProgramKey function, passing in theresult from b;

[5620] d. ChipF updates R_(F), then calculates and returns R_(F),S_(Kold)[R_(P)|R_(F)|C₃]⊕K_(new), andS_(Kold)[R_(F)|S_(Kold)[R_(P)|R_(F)|C₃]⊕K_(new)|C₃];

[5621] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in the response from d;

[5622] f. System checks response from ChipP. If the response is 1, thenK_(P) has been correctly updated to K_(new). If the response is 0, K_(P)has not been updated.

[5623] The data flow for key updates is shown in FIG. 337.

[5624] Note that K_(new) is never passed in the open. An attacker couldsend its own R_(P), but cannot produce S_(Kold)[R_(P)|R_(F)|C₃] withoutK_(old). The third parameter, a signature, is sent to ensure that ChipPcan determine if either of the first two parameters have been changed enroute.

[5625] CountRemaining needs to be setup in M_(F) (including making itReadOnly in P) before ChipF is programmed with K_(P). ChipF shouldtherefore be programmed to only perform a limited number ofGetProgramKey operations (thereby limiting compromise exposure if aChipF is stolen). An authorized ChipS can be used to update this counterif neccesary (see Section 8.4 on page 665).

[5626] 8.5.1 Chicken and Egg

[5627] Of course, for the Program Key protocol to work, both ChipF andChipP must both know K_(old). Obviously both chips had to be programmedwith K_(old), and thus K_(old) can be thought of as an olderK_(new):K_(old) can be placed in chips if another ChipF knows K_(older),and so on.

[5628] Although this process allows a chain of reprogramming of keys,with each stage secure, at some stage the very first key (K_(first))must be placed in the chips. K_(first) is in fact programmed with thechip's microcode at the manufacturing test station as the last step inmanufacturing test. K_(first) can be a manufacturing batch key, changedfor each batch or for each customer etc, and can have as short a life asdesired. Compromising K_(first) need not result in a complete compromiseof the chain of K_(S).

[5629] 9 Multiple Key Single Memory Vector

[5630] 9.1 Protocol Background

[5631] This protocol set is an extension to the single key single memoryvector protocol set, and is provided for two reasons:

[5632] the multiple key multiple memory vector protocol set defined inthis document is simply extensions of this one; and

[5633] it is useful in its own right

[5634] The multiple key protocol set is typically useful forapplications where there are multiple types of systems and consumables,and they need to work with each other in various ways. This is typicallyin the following situations:

[5635] when different systems want to share some consumables, but notothers. For example printer models may share some ink cartridges and notshare others.

[5636] when there are different owners of data in M. Part of the memoryvector may be owned by one company (eg the speed of the printer) andanother may be owned by another (eg the serial number of the chIP). Inthis case a given key K_(n) needs to be able to write to a given part ofM, and other keys K_(n) need to be disallowed from writing to these sameareas.

[5637] 9.2 Requirements of Protocol

[5638] Each QA Chip contains the following values:

[5639] N The maximum number of keys known to the chIP.

[5640] K_(N) Array of N secret keys used for calculating F_(Kn)[X] whereK_(n) is the nth element of the array. Each K_(n) must not be storeddirectly in the QA Chip. Instead, each chip needs to store a singlerandom number R_(K) (different for each chIP), K_(n)⊕R_(K), and

K_(n)⊕R_(K). The stored K_(n)⊕R_(K) can be XORed with R_(K) to obtainthe real K_(n). Although

K_(n)⊕R_(K) must be stored to protect against differential attacks, itis not used.

[5641] R Current random number used to ensure time varying messages.Each chip instance must be seeded with a different initial value.Changes for each signature generation.

[5642] M Memory vector of QA ChIP. A fixed part of M contains N inReadOnly form so users of the chip can know the number of keys known bythe chIP.

[5643] P N+1 element array of access permissions for each part of M.Entry 0 holds access permissions for non-authenticated writes to M (nokey required). Entries 1 to N+1 hold access permissions forauthenticated writes to M, one for each K. Permission choices for eachpart of M are Read Only, Read/Write, and Decrement Only.

[5644] C 3 constants used for generating signatures. C₁, C₂, and C₃ areconstants that pad out a submessage to a hashing boundary, and all 3must be different.

[5645] Each QA Chip contains the following private function:

[5646] S_(Kn)[N,X] Internal function only. Returns S_(Kn)[X], the resultof applying a digital signature function S to X based upon theappropriate key K_(n). The digital signature must be long enough tocounter the chances of someone generating a random signature. The lengthdepends on the signature scheme chosen, although the scheme chosen forthe QA Chip is HMAC-SHA1 (see Section 13 on page 691), and therefore thelength of the signature is 160 bits.

[5647] Additional functions are required in certain QA Chips, but theseare described as required.

[5648] 9.3 Reads

[5649] As with the single key scenario, we have a trusted chip (ChipT)connected to a System. The System wants to authenticate an object thatcontains a non-trusted chip (ChipA). In effect, the System wants to knowthat it can securely read a memory vector (M) from ChipA: to be surethat ChipA is valid and that M has not been altered.

[5650] The protocol requires the following publicly available functions:

[5651] Random

Returns R (does not advance R).

[5652] Read[n, X] Advances R, and returns R, M, S_(Kn)[X|R|C₁|M]. Thetime taken to calculate the signature must not be based on the contentsof X, R, M, or K.

[5653] Test[n,X, Y, Z] Advances R and returns 1 if S_(Kn)[R|X|C₁|Y]=Z.Otherwise returns 0. The time taken to calculate and compare signaturesmust be independent of data content.

[5654] To authenticate ChipA and read ChipA's memory M:

[5655] a. System calls ChipT's Random function;

[5656] b. ChipT returns R_(T) to System;

[5657] c. System calls ChipA's Read function, passing in some key numbern1 and the result from b;

[5658] d. ChipA updates R_(A), then calculates and returns R_(A), M_(A),S_(KAn1)[R_(T)|R_(A)|C₁|M_(A)];

[5659] e. System calls ChipT's Test function, passing in n2, R_(A),M_(A), S_(KAn1)[R_(T)|R_(A)|C₁|M_(A)];

[5660] f. System checks response from ChipT. If the response is 1, thenChipA is considered authentic. If 0, ChipA is considered invalid.

[5661] The choice of n1 and n2 must be such that ChipA's K_(n1)=ChipT'sK_(n2).

[5662] The data flow for read authentication is shown in FIG. 338.

[5663] The protocol allows System to simply pass data from one chip toanother, with no special processing. The protection relies on ChipTbeing trusted, even though System does not know K.

[5664] When ChipT is physically separate from System (eg is chip on aboard connected to System) System must also occassionally (based onsystem clock for example) call ChipT's Test function with bad data,expecting a 0 response. This is to prevent someone from inserting a fakeChipT into the system that always returns 1 for the Test function.

[5665] It is important that n1 is chosen by System. Otherwise ChipAwould need to return N_(A) sets of signatures for each read, since ChipAdoes not know which of the keys will satisfy ChipT. Similarly, systemmust also choose n2, so it can potentially restrict the number of keysin ChipT that are matched against (otherwise ChipT would have to matchagainst all its keys). This is important in order to restrict howdifferent keys are used. For example, say that ChipT contains 6 keys,keys 0-2 are for various printer-related upgrades, and keys 3-6 are forinks. ChipA contains say 4 keys, one key for each printer model. Atpower-up, System goes through each of chipA's keys 0-3, trying each outagainst ChipT's keys 3-6. System doesn't try to match against ChipT'skeys 0-2. Otherwise knowledge of a speed-upgrade key could be used toprovide ink QA Chip chips. This matching needs to be done only once (egat power up). Once matching keys are found, System can continue to usethose key numbers.

[5666] Since System needs to know N_(T) and N_(A), part of M is used tohold N (eg in Read Only form), and the system can obtain it by callingthe Read function, passing in key 0.

[5667] 9.4 Writes

[5668] As with the single key scenario, the System wants to update M inChipU. As before, this can be done in a non-authenticated andauthenticated way.

[5669] 9.4.1 Non-Authenticated Writes

[5670] This is the most frequent type of write, and takes place betweenthe System/consumable during normal everyday operation. In this kind ofwrite, System wants to change M subject to P. For example, the Systemcould be decrementing the amount of consumable remaining. AlthoughSystem does not need to know any of the K_(S) or even have access to atrusted chip to perform the write, System must follow anon-authenticated write by an authenticated read if it needs to knowthat the write was successful.

[5671] The protocol requires the following publicly available function:

[5672] Write[X] Writes X over those parts of M subject to P₀ and theexisting value for M.

[5673] To authenticate a write of M_(new) to ChipA's memory M:

[5674] a. System calls ChipU's Write function, passing in M_(new);

[5675] b. The authentication procedure for a Read is carried out (seeSection 9.3 on page 671);

[5676] c. If ChipU is authentic and M_(new)=M returned in b, the writesucceeded. If not, it failed.

[5677] 9.4.2 Authenticated Writes

[5678] In this kind of write, System wants to change Chip U's M in anauthorized way, without being subject to the permissions that applyduring normal operation (P₀). For example, the consumable may be at arefilling station and the normally Decrement Only section of M should beupdated to include the new valid consumable. In this case, the chipwhose M is being updated must authenticate the writes being generated bythe external System and in addition, apply the appropriate permissionfor the key to ensure that only the correct parts of M are updated.Having a different permission for each key is required as when multiplekeys are involved, all keys should not necessarily be given open accessto M. For example, suppose M contains printer speed and a counter ofmoney available for franking. A ChipS that updates printer speed shouldnot be capable of updating the amount of money. Since P₀ is used fornon-authenticated writes, each K_(n) has a corresponding permissionP_(n+1) that determines what can be updated in an authenticated write.

[5679] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[5680] The protocol requires the following publicly available functionsin ChipU:

[5681] Read[n, X] Advances R, and returns R, M, S_(Kn)[X|R|C₁|M]. Thetime taken to calculate the signature must not be based on the contentsof X, R, M, or K.

[5682] WriteA[n, X, Y, Z] Advances R, replaces M by Y subject toP_(n+1), and returns 1 only if S_(Kn)[R|X|C₁|Y]=Z. Otherwise returns 0.The time taken to calculate and compare signatures must be independentof data content. This function is identical to ChipT's Test functionexcept that it additionally writes Y subject to P_(n+1) to its M whenthe signature matches.

[5683] Authenticated writes require that the System has access to aChipS that is capable of generating appropriate signatures. ChipSrequires the following variables and function:

[5684] CountRemaining Part of M that contains the number of signaturesthat ChipS is allowed to generate. Decrements with each successful callto SignM and SignP. Permissions in ChipS's P₀ for this part of M needsto be ReadOnly once ChipS has been setup. Therefore CountRemaining canonly be updated by another ChipS that will perform updates to that partof M (assuming ChipS's P allows that part of M to be updated).

[5685] Q Part of M that contains the write permissions for updatingChipU's M. By adding Q to ChipS we allow different ChipSs that canupdate different parts of M_(U). Permissions in ChipS's P₀ for this partof M needs to be ReadOnly once ChipS has been setup. Therefore Q canonly be updated by another ChipS that will perform updates to that partof M.

[5686] SignM[n,V,W,X,Y,Z] Advances R, decrements CountRemaining andreturns R, Z_(QX) (Z applied to X with permissions Q),S_(Kn)[W|R|C₁|Z_(QX)] only if Y=S_(Kn)[V|W|C₁|X] and CountRemaining>0.Otherwise returns all 0s. The time taken to calculate and comparesignatures must be independent of data content.

[5687] To update ChipU's M vector:

[5688] a. System calls ChipU's Read function, passing in n1 and 0 as theinput parameters;

[5689] b. ChipU produces R_(U), M_(U), S_(Kn1)[0|R_(U)|C₁|M_(U)] andreturns these to System;

[5690] c. System calls ChipS's SignM function, passing in n2 (the key tobe used in ChipS), 0 (as used in a), R_(U), M_(U),S_(Kn1)[0|R_(U)|C₁|M_(U)], and M_(D) (the desired vector to be writtento ChipU);

[5691] d. ChipS produces R_(S), M_(QD) (processed by running M_(D)against M_(U) using Q) and S_(Kn2)[R_(U)|R_(S)|C₁|M_(QD)] if the inputswere valID, and 0 for all outputs if the inputs were not valid.

[5692] e. If values returned in d are non zero, then ChipU is consideredauthentic. System can then call ChipU's WriteA function with thesevalues from d.

[5693] f. ChipU should return a 1 to indicate success. A 0 should onlybe returned if the data generated by ChipS is incorrect (e.g. atransmission error).

[5694] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[5695] The data flow for authenticated writes is shown in FIG. 339below.

[5696] Note that Q in ChipS is part of ChipS's M. This allows a user toset up ChipS with a permission set for upgrades. This should be done toChipS and that part of M designated by P₀ set to ReadOnly before ChipSis programmed with K_(U). If K_(S) is programmed with K_(U) first, thereis a risk of someone obtaining a half-setup ChipS and changing all ofM_(U) instead of only the sections specified by Q.

[5697] In addition, CountRemaining in ChipS needs to be setup (includingmaking it ReadOnly in P_(S)) before ChipS is programmed with K_(U).ChipS should therefore be programmed to only perform a limited number ofSignM operations (thereby limiting compromise exposure if a ChipS isstolen). Thus ChipS would itself need to be upgraded with a newCountRemaining every so often.

[5698] 9.4.3 Updating Permissions for Future Writes

[5699] In order to reduce exposure to accidental and malicious attackson P (and certain parts of M), only authorized users are allowed toupdate P. Writes to P are the same as authorized writes to M, exceptthat they update P_(n) instead of M. Initially (at manufacture), P isset to be Read/Write for all parts of M. As different processes fill updifferent parts of M, they can be sealed against future change byupdating the permissions. Updating a chip's P₀ changes permissions forunauthorized writes, and updating P_(n+1) changes permissions forauthorized writes with key K_(n).

[5700] P_(n) is only allowed to change to be a more restrictive form ofitself. For example, initially all parts of M have permissions ofRead/Write. A permission of Read/Write can be updated to Decrement Onlyor Read Only. A permission of Decrement Only can be updated to becomeRead Only. A Read Only permission cannot be further restricted.

[5701] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[5702] The protocol requires the following publicly available functionsin ChipU:

[5703] Random

Returns R (does not advance R).

[5704] SetPermission[n,p,X,Y,Z] Advances R, and updates P_(p) accordingto Y and returns 1 followed by the resultant P_(p) only ifS_(Kn)[R|X|Y|C₂]=Z. Otherwise returns 0. P_(p) can only become morerestricted. Passing in 0 for any permission leaves it unchanged (passingin Y=0 returns the current P_(p)).

[5705]

[5706] Authenticated writes of permissions require that the System hasaccess to a ChipS that is capable of generating appropriate signatures.ChipS requires th following variables and function:

[5707] CountRemaining Part of M that contains the number of signaturesthat ChipS is allowed to generate. Decrements with each successful callto SignM and SignP. Permissions in ChipS's P₀ for this part of M needsto be ReadOnly once ChipS has been setup. Therefore CountRemaining canonly be updated by another ChipS that will perform updates to that partof M (assuming ChipS's P_(n) allows that part of M to be updated).

[5708] SignP[n,X,Y] Advances R, decrements CountRemaining and returns Rand S_(Kn)[X|R|Y|C₂] only if CountRemaining>0. Otherwise returns all 0s.The time taken to calculate and compare signatures must be independentof data content.

[5709] To update ChipU's P_(n):

[5710] a. System calls ChipU's Random function;

[5711] b. ChipU returns R_(U) to System;

[5712] c. System calls ChipS's SignP function, passing in n1, R_(U) andP_(D) (the desired P to be written to ChipU);

[5713] d. ChipS produces R_(S) and S_(Kn1)[R_(U)|R_(S)|P_(D)|C₂] if itis still permitted to produce signatures.

[5714] e. If values returned in d are non zero, then System can thencall ChipU's SetPermission function with n2, the desired permissionentry p, R_(S), P_(D) and S_(Kn1)[R_(U)|R_(S)|P_(D)|C₂].

[5715] f. ChipU verifies the received signature againstS_(Kn2)[R_(U)|R_(S)|P_(D)|C₂] and applies P_(D) to P_(n) if thesignature matches

[5716] g. System checks 1st output parameter. 1=success, 0=failure.

[5717] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[5718] The data flow for authenticated writes to permissions is shown inFIG. 340 below.

[5719] 9.4.4 Protecting M in a Multiple Key System

[5720] To protect the appropriate part of M, the SetPermission functionmust be called after the part of M has been set to the desired value.

[5721] For example, if adding a serial number to an area of M that iscurrently ReadWrite so that noone is permitted to update the numberagain:

[5722] the Write function is called to write the serial number to M

[5723] SetPermission is called for n={1, . . . , N} to set that part ofM to be ReadOnly for authorized writes using key n−1.

[5724] SetPermission is called for 0 to set that part of M to beReadOnly for non-authorized writes

[5725] For example, adding a consumable value to M such that only keys1-2 can update it, and keys 0, and 3-N cannot:

[5726] the Write function is called to write the amount of consumable toM

[5727] SetPermission is called for n={1, 4, 5, . . . , N−1} to set thatpart of M to be ReadOnly for authorized writes using key n−1. Thisleaves keys 1 and 2 with ReadWrite permissions.

[5728] SetPermission is called for 0 to set that part of M to beDecrementOnly for non-authorized writes. This allows the amount ofconsumable to decrement.

[5729] It is possible for someone who knows a key to further restrictother keys, but it is not in anyone's interest to do so.

[5730] 9.5 Programming K

[5731] In this case, we have a factory chip (ChipF) connected to aSystem. The System wants to program the key in another chip (ChipP).System wants to avoid passing the new key to ChipP in the clear, andalso wants to avoid the possibility of the key-upgrade message beingreplayed on another ChipP (even if the user doesn't know the key).

[5732] The protocol is a simple extension of the single key protocol inthat it assumes that ChipF and ChipP already share a secret key K_(old).This key is used to ensure that only a chip that knows K_(old) can setK_(new).

[5733] The protocol requires the following publicly available functionsin ChipP:

[5734] Random

Returns R (does not advance R).

[5735] ReplaceKey[n, X, Y, Z] Replaces K_(n) by S_(Kn)[R|X|C₃]⊕Y,advances R, and returns 1 only if S_(Kn)[X|Y|C₃]=Z. Otherwise returns 0.The time taken to calculate signatures and compare values must beidentical for all inputs.

[5736] And the following data and functions in ChipF:

[5737] CountRemaining Part of M with contains the number of signaturesthat ChipF is allowed to generate. Decrements with each successful callto GetProgramKey. Permissions in P for this part of M needs to beReadOnly once ChipF has been setup. Therefore can only be updated by aChipS that has authority to perform updates to that part of M.

[5738] K_(new) The new key to be transferred from ChipF to ChipP. Mustnot be visible.

[5739] SetPartialKey[X,Y] If word X of K_(new) has not yet been set, setword X of K_(new) to Y and return 1. Otherwise return 0. This functionallows K_(new) to be programmed in multiple steps, thereby allowingdifferent people or systems to know different parts of the key (but notthe whole K_(new)). K_(new) is stored in ChipF's flash memory. Sincethere is a small number of ChipFs, it is theoretically not necessary tostore the inverse of K_(new), but it is stronger protection to do so.

[5740] GetProgramKey[n, X] Advances R_(F), decrements CountRemaining,outputs R_(F), the encrypted key S_(Kn)[X|R_(F)|C₃]⊕K_(new) and asignature of the first two outputs plus C₃ if CountRemaining>0.Otherwise outputs 0. The time to calculate the encrypted key & signaturemust be identical for all inputs.

[5741] To update P's key:

[5742] a. System calls ChipP's Random function;

[5743] b. ChipP returns R_(P) to System;

[5744] c. System calls ChipF's GetProgramKey function, passing in n1(the desired key to use) and the result from b;

[5745] d. ChipF updates R_(F), then calculates and returns R_(F),S_(Kn1)[R_(P)|R_(F)|C₃]⊕K_(new), andS_(Kn1)[R_(F)|S_(Kn1)[R_(P)|R_(F)|C₃]⊕K_(new)|C₃];

[5746] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in n2 (the key to use in ChipP) and theresponse from d;

[5747] f. System checks response from ChipP. If the response is 1, thenK_(Pn2) has been correctly updated to K_(new). If the response is 0,K_(Pn2) has not been updated.

[5748] The choice of n1 and n2 must be such that ChipF's K_(n1)=ChipP'sK_(n2).

[5749] The data flow for key updates is shown in FIG. 341 below.

[5750] Note that K_(new) is never passed in the open. An attacker couldsend its own R_(P), but cannot produce S_(Kn1)[R_(P)|R_(F)|C₃] withoutK_(n1). The signature based on K_(new) is sent to ensure that ChipP willbe able to determine if either of the first two parameters have beenchanged en route.

[5751] CountRemaining needs to be setup in M_(F) (including making itReadOnly in P) before ChipF is programmed with K_(P). ChipF shouldtherefore be programmed to only perform a limited number ofGetProgramKey operations (thereby limiting compromise exposure if aChipF is stolen). An authorized ChipS can be used to update this counterif neccesary (see Section 9.4 on page 673).

[5752] 9.5.1 Chicken and Egg

[5753] As with the single key protocol, for the Program Key protocol towork, both ChipF and ChipP must both know K_(old). Obviously both chipshad to be programmed with K_(old), and thus K_(old) can be thought of asan older K_(new):K_(old) can be placed in chips if another ChipF knowsK_(older), and so on.

[5754] Although this process allows a chain of reprogramming of keys,with each stage secure, at some stage the very first key (K_(first))must be placed in the chips. K_(first) is in fact programmed with thechip's microcode at the manufacturing test station as the last step inmanufacturing test. K_(first) can be a manufacturing batch key, changedfor each batch or for each customer etc, and can have as short a life asdesired. Compromising K_(first) need not result in a complete compromiseof the chain of Ks.

[5755] Depending on the reprogramming requirements, K_(first) can be thesame or different for all K_(n).

[5756] 10 Multiple Keys Multiple Memory Vectors

[5757] 10.1 Protocol Background

[5758] This protocol set is a slight restriction of the multiple keysingle memory vector protocol set, and is the expected protocol. It is arestriction in that M has been optimized for Flash memory utilization.

[5759] M is broken into multiple memory vectors (semi-fixed and variablecomponents) for the purposes of optimizing flash memory utilization.Typically M contains some parts that are fixed at some stage of themanufacturing process (eg a batch number, serial number etc), and onceset, are not ever updated. This information does not contain the amountof consumable remaining, and therefore is not read or written to withany great frequency.

[5760] We therefore define M₀ to be the M that contains the frequentlyupdated sections, and the remaining Ms to be rarely written to.Authenticated writes only write to M₀, and non-authenticated writes canbe directed to a specific M_(n). This reduces the size of permissionsthat are stored in the QA Chip (since key-based writes are not requiredfor Ms other than M₀). It also means that M₀ and the remaining Ms can bemanipulated in different ways, thereby increasing flash memorylongevity.

[5761] 10.2 Requirements of Protocol

[5762] Each QA Chip contains the following values:

[5763] N The maximum number of keys known to the chIP.

[5764] T The number of vectors M is broken into.

[5765] K_(N) Array of N secret keys used for calculating F_(Kn)[X] whereK_(n) is the nth element of the array. Each K_(n) must not be storeddirectly in the QA Chip. Instead, each chip needs to store a singlerandom number R_(K) (different for each chIP), K_(n)⊕R_(K), and

K_(n)⊕R_(K). The stored K_(n)⊕R_(K) can be XORed with R_(K) to obtainthe real K_(n). Although

K_(n)⊕R_(K) must be stored to protect against differential attacks, itis not used.

[5766] R Current random number used to ensure time varying messages.Each chip instance must be seeded with a different initial value.Changes for each signature generation.

[5767] M_(T) Array of T memory vectors. Only M₀ can be written to withan authorized write, while all Ms can be written to in an unauthorizedwrite. Writes to M₀ are optimized for Flash usage, while updates to anyother M_(n) are expensive with regards to Flash utilization, and areexpected to be only performed once per section of M_(n). M₁ contains Tand N in ReadOnly form so users of the chip can know these two values.

[5768] P_(T+N) T+N element array of access permissions for each part ofM. Entries n={0 . . . T−1} hold access permissions for non-authenticatedwrites to M_(n) (no key required). Entries n={T to T+N−1}hold accesspermissions for authenticated writes to M₀ for K_(n). Permission choicesfor each part of M are Read Only, Read/Write, and Decrement Only.

[5769] C 3 constants used for generating signatures. C₁, C₂, and C₃ areconstants that pad out a submessage to a hashing boundary, and all 3must be different.

[5770] Each QA Chip contains the following private function:

[5771] S_(Kn)[N,X] Internal function only. Returns S_(Kn)[X], the resultof applying a digital signature function S to X based upon theappropriate key K_(n). The digital signature must be long enough tocounter the chances of someone generating a random signature. The lengthdepends on the signature scheme chosen, although the scheme chosen forthe QA Chip is HMAC-SHA1, and therefore the length of the signature is160 bits.

[5772] Additional functions are required in certain QA Chips, but theseare described as required.

[5773] 10.3 Reads

[5774] As with the previous scenarios, we have a trusted chip (ChipT)connected to a System. The System wants to authenticate an object thatcontains a non-trusted chip (ChipA). In effect, the System wants to knowthat it can securely read a memory vector (M_(t)) from ChipA: to be surethat ChipA is valid and that M has not been altered.

[5775] The protocol requires the following publicly available functions:

[5776] Random

Returns R (does not advance R).

[5777] Read[n, t, X] Advances R, and returns R, M_(t),S_(Kn)[X|R|C₁|M_(t)]. The time taken to calculate the signature must notbe based on the contents of X, R, M_(t), or K. If t is invalID, thefunction assumes t=0.

[5778] Test[n,X, Y, Z] Advances R and returns 1 if S_(Kn)[R|X|C₁|Y]=Z.Otherwise returns 0. The time taken to calculate and compare signaturesmust be independent of data content.

[5779] To authenticate ChipA and read ChipA's memory M:

[5780] a. System calls ChipT's Random function;

[5781] b. ChipT returns R_(T) to System;

[5782] c. System calls ChipA's Read function, passing in some key numbern1, the desired M number t, and the result from b;

[5783] d. ChipA updates R_(A), then calculates and returns R_(A),M_(At), S_(KAn1)[R_(T)|R_(A)|C₁|M_(At)];

[5784] e. System calls ChipT's Test function, passing in n2, R_(A),M_(At), S_(KAn1)[R_(T)|R_(A)|C₁|M_(At)];

[5785] f. System checks response from ChipT. If the response is 1, thenChipA is considered authentic. If 0, ChipA is considered invalid.

[5786] The choice of n1 and n2 must be such that ChipA's K_(n1)=ChipT'sK_(n2).

[5787] The data flow for read authentication is shown in FIG. 342 below.

[5788] The protocol allows System to simply pass data from one chip toanother, with no special processing. The protection relies on ChipTbeing trusted, even though System does not know K.

[5789] When ChipT is physically separate from System (eg is chip on aboard connected to System) System must also occassionally (based onsystem clock for example) call ChipT's Test function with bad data,expecting a 0 response. This is to prevent someone from inserting a fakeChipT into the system that always returns 1 for the Test function.

[5790] It is important that n1 is chosen by System. Otherwise ChipAwould need to return NA sets of signatures for each read, since ChipAdoes not know which of the keys will satisfy ChipT. Similarly, systemmust also choose n2, so it can potentially restrict the number of keysin ChipT that are matched against (otherwise ChipT would have to matchagainst all its keys). This is important in order to restrict howdifferent keys are used. For example, say that ChipT contains 6 keys,keys 0-2 are for various printer-related upgrades, and keys 3-6 are forinks. ChipA contains say 4 keys, one key for each printer model. Atpower-up, System goes through each of chipA's keys 0-3, trying each outagainst ChipT's keys 3-6. System doesn't try to match against ChipT'skeys 0-2. Otherwise knowledge of a speed-upgrade key could be used toprovide ink QA Chip chips. This matching needs to be done only once (egat power up). Once matching keys are found, System can continue to usethose key numbers.

[5791] Since System needs to know N_(T), N_(A), and T_(A), part of M₁ isused to hold N (eg in Read Only form), and the system can obtain it bycalling the Read function, passing in key 0 and t=1.

[5792] 10.4 Writes

[5793] As with the previous scenarios, the System wants to update M_(t)in ChipU. As before, this can be done in a non-authenticated andauthenticated way.

[5794] 10.4.1 Non-Authenticated Writes

[5795] This is the most frequent type of write, and takes place betweenthe System/consumable during normal everyday operation for M₀, andduring the manufacturing process for M_(t).

[5796] In this kind of write, System wants to change M subject to P. Forexample, the System could be decrementing the amount of consumableremaining. Although System does not need to know and of the K_(S) oreven have access to a trusted chip to perform the write, System mustfollow a non-authenticated write by an authenticated read if it needs toknow that the write was successful.

[5797] The protocol requires the following publicly available function:

[5798] Write[t, X] Writes X over those parts of M_(t) subject to P_(t)and the existing value for M.

[5799] To authenticate a write of M_(new) to ChipA's memory M:

[5800] a. System calls ChipU's Write function, passing in M_(new);

[5801] b. The authentication procedure for a Read is carried out (seeSection 9.3 on page 671);

[5802] c. If ChipU is authentic and M_(new)=M returned in b, the writesucceeded. If not, it failed.

[5803] 10.4.2 Authenticated Writes

[5804] In the multiple memory vectors protocol, only M₀ can be writtento an an authenticated way. This is because only M₀ is considered tohave components that need to be upgraded.

[5805] In this kind of write, System wants to change Chip U's M₀ in anauthorized way, without being subject to the permissions that applyduring normal operation. For example, the consumable may be at arefilling-station and the normally Decrement Only section of M₀ shouldbe updated to include the new valid consumable. In this case, the chipwhose M₀ is being updated must authenticate the writes being generatedby the external System and in addition, apply the appropriate permissionfor the key to ensure that only the correct parts of M₀ are updated.Having a different permission for each key is required as when multiplekeys are involved, all keys should not necessarily be given open accessto M₀. For example, suppose M₀ contains printer speed and a counter ofmoney available for franking. A ChipS that updates printer speed shouldnot be capable of updating the amount of money. Since P_(0 . . . T−1) isused for non-authenticated writes, each K_(n) has a correspondingpermission P_(T+n) that determines what can be updated in anauthenticated write.

[5806] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[5807] The protocol requires the following publicly available functionsin ChipU:

[5808] Read[n, t, X] Advances R, and returns R, M_(t),S_(Kn)[X|R|C₁|M_(t)]. The time taken to calculate the signature must notbe based on the contents of X, R, M_(t), or K.

[5809] WriteA[n, X, Y, Z] Advances R, replaces M₀ by Y subject toP_(T+n), and returns 1 only if S_(Kn)[R|X|C₁|Y]=Z. Otherwise returns 0.The time taken to calculate and compare signatures must be independentof data content. This function is identical to ChipT's Test functionexcept that it additionally writes Y subject to P_(T+n) to its M whenthe signature matches.

[5810] Authenticated writes require that the System has access to aChipS that is capable of generating appropriate signatures. ChipSrequires the following variables and function:

[5811] CountRemaining Part of M that contains the number of signaturesthat ChipS is allowed to generate. Decrements with each successful callto SignM and SignP. Permissions in ChipS's P_(0..T−1) for this part of Mneeds to be ReadOnly once ChipS has been setup. Therefore CountRemainingcan only be updated by another ChipS that will perform updates to thatpart of M (assuming ChipS's P allows that part of M to be updated).

[5812] Q Part of M that contains the write permissions for updatingChipU's M. By adding Q to ChipS we allow different ChipSs that canupdate different parts of M_(U). Permissions in ChipS's P_(0..T−1) forthis part of M needs to be ReadOnly once ChipS has been setup. ThereforeQ can only be updated by another ChipS that will perform updates to thatpart of M.

[5813] SignM[n,V,W,X,Y,Z] Advances R, decrements CountRemaining andreturns R, Z_(QX) (Z applied to X with permissions Q),S_(Kn)[W|R|C₁|Z_(QX) only if Y=S_(Kn)[V|W|C₁|X] and CountRemaining>0.Otherwise returns all 0s. The time taken to calculate and comparesignatures must be independent of data content.

[5814] To update ChipU's M vector:

[5815] a. System calls ChipU's Read function, passing in n1, 0 and 0 asthe input parameters;

[5816] b. ChipU produces R_(U), M_(U0), S_(Kn1)[0|R_(U)|C₁|M_(U0)] andreturns these to System;

[5817] c. System calls ChipS's SignM function, passing in n2 (the key tobe used in ChipS), 0 (as used in a), R_(U), M_(U0),S_(Kn1)[0|R_(U)|C₁|M_(U0)], and M_(D) (the desired vector to be writtento ChipU);

[5818] d. ChipS produces R_(S), M_(QD) (processed by running M_(D)against M_(U0) using Q) and S_(Kn2)[R_(U)|R_(S)|C₁|M_(QD)] if the inputswere valID, and 0 for all outputs if the inputs were not valid.

[5819] e. If values returned in d are non zero, then ChipU is consideredauthentic. System can then call ChipU's WriteA function with thesevalues from d.

[5820] f. ChipU should return a 1 to indicate success. A 0 should onlybe returned if the data generated by ChipS is incorrect (e.g. atransmission error).

[5821] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[5822] The data flow for authenticated writes is shown in FIG. 343below.

[5823] Note that Q in ChipS is part of ChipS's M. This allows a user toset up ChipS with a permission set for upgrades. This should be done toChipS and that part of M designated by P_(0..T−1) set to ReadOnly beforeChipS is programmed with K_(U). If K_(S) is programmed with K_(U) first,there is a risk of someone obtaining a half-setup ChipS and changing allof M_(U) instead of only the sections specified by Q.

[5824] In addition, CountRemaining in ChipS needs to be setup (includingmaking it ReadOnly in P_(S)) before ChipS is programmed with K_(U).ChipS should therefore be programmed to only perform a limited number ofSignM operations (thereby limiting compromise exposure if a ChipS isstolen). Thus ChipS would itself need to be upgraded with a newCountRemaining every so often.

[5825] 10.4.3 Updating Permissions for Future Writes

[5826] In order to reduce exposure to accidental and malicious attackson P (and certain parts of M), only authorized users are allowed toupdate P. Writes to P are the same as authorized writes to M, exceptthat they update P_(n) instead of M. Initially (at manufacture), P isset to be Read/Write for all M. As different processes fill up differentparts of M, they can be sealed against future change by updating thepermissions. Updating a chip's P_(0..T−1) changes permissions forunauthorized writes to M_(n), and updating P_(T..T+N−1) changespermissions for authorized writes with key K_(n).

[5827] P_(n) is only allowed to change to be a more restrictive form ofitself. For example, initially all parts of M have permissions ofRead/Write. A permission of Read/Write can be updated to Decrement Onlyor Read Only. A permission of Decrement Only can be updated to becomeRead Only. A Read Only permission cannot be further restricted.

[5828] In this transaction protocol, the System's chip is referred to asChipS, and the chip being updated is referred to as ChipU. Each chipdistrusts the other.

[5829] The protocol requires the following publicly available functionsin ChipU:

[5830] Random

Returns R (does not advance R).

[5831] SetPermission[n,p,X,Y,Z] Advances R, and updates P_(p) accordingto Y and returns 1 followed by the resultant P_(p) only ifS_(Kn)[R|X|Y|C₂]=Z. Otherwise returns 0. P_(p) can only become morerestricted. Passing in 0 for any permission leaves it unchanged (passingin Y=0 returns the current P_(p)).

[5832] Authenticated writes of permissions require that the System hasaccess to a ChipS that is capable of generating appropriate signatures.ChipS requires the following variables and function:

[5833] CountRemaining Part of ChipS's M₀ that contains the number ofsignatures that ChipS is allowed to generate. Decrements with eachsuccessful call to SignM and SignP. Permissions in ChipS'sP_(0 . . . T−1) for this part of M₀ needs to be ReadOnly once ChipS hasbeen setup. Therefore CountRemaining can only be updated by anotherChipS that will perform updates to that part of M₀ (assuming ChipS'sP_(n), allows that part of M₀ to be updated).

[5834] SignP[n,X,Y] Advances R, decrements CountRemaining and returns Rand S_(Kn)[X|R|Y|C₂] only if CountRemaining>0. Otherwise returns all 0s.The time taken to calculate and compare signatures must be independentof data content.

[5835] To update ChipU's P_(n):

[5836] a. System calls ChipU's Random function;

[5837] b. ChipU returns R_(U) to System;

[5838] c. System calls ChipS's SignP function, passing in n1, R_(U) andP_(D) (the desired P to be written to ChipU);

[5839] d. ChipS produces R_(S) and S_(Kn1)[R_(U)|R_(S)|P_(D)|C₂] if itis still permitted to produce signatures.

[5840] e. If values returned in d are non zero, then System can thencall ChipU's SetPermission function with n2, the desired permissionentry p, R_(S), P_(D) and S_(Kn1)[R_(U)|R_(S)|P_(D)|C₂].

[5841] f. ChipU verifies the received signature againstS_(Kn2)[R_(U)|R_(S)|P_(D)|C₂] and applies P_(D) to P_(n) if thesignature matches

[5842] g. System checks 1 st output parameter. 1=success, 0=failure.

[5843] The choice of n1 and n2 must be such that ChipU's K_(n1)=ChipS'sK_(n2).

[5844] The data flow for authenticated writes to permissions is shown inFIG. 344 below.

[5845] 10.4.4 Protecting M in a Multiple Key Multiple M System

[5846] To protect the appropriate part of M_(n) against unauthorizedwrites, call SetPermissions[n] for n=0 to T−1. To protect theappropriate part of M₀ against authorized writes with key n, callSetPermissions[T+n] for n=0 to N−1.

[5847] Note that only M₀ can be written in an authenticated fashion.

[5848] Note that the SetPermission function must be called after thepart of M has been set to the desired value.

[5849] For example, if adding a serial number to an area of M₁ that iscurrently ReadWrite so that noone is permitted to update the numberagain:

[5850] the Write function is called to write the serial number to M₁

[5851] SetPermission(1) is called for to set that part of M to beReadOnly for non-authorized writes.

[5852] If adding a consumable value to M₀ such that only keys 1-2 canupdate it, and keys 0, and 3-N cannot:

[5853] the Write function is called to write the amount of consumable toM

[5854] SetPermission is called for 0 to set that part of M₀ to beDecrementOnly for non-authorized writes. This allows the amount ofconsumable to decrement.

[5855] SetPermission is called for n={T, T+3, T+4 . . . , T+N−1} to setthat part of M₀ to be ReadOnly for authorized writes using all but keys1 and 2. This leaves keys 1 and 2 with ReadWrite permissions to M₀.

[5856] It is possible for someone who knows a key to further restrictother keys, but it is not in anyone's interest to do so.

[5857] 10.5 Programming K

[5858] This section is identical to the multiple key single memoryvector (Section 9.5 on page 677). It is repeated here with mention to M₀instead of M for CountRemaining.

[5859] In this case, we have a factory chip (ChipF) connected to aSystem. The System wants to program the key in another chip (ChipP).System wants to avoid passing the new key to ChipP in the clear, andalso wants to avoid the possibility of the key-upgrade message beingreplayed on another ChipP (even if the user doesn't know the key).

[5860] The protocol is a simple extension of the single key protocol inthat it assumes that ChipF and ChipP already share a secret key K_(old).This key is used to ensure that only a chip that knows K_(old) can setK_(new).

[5861] The protocol requires the following publicly available functionsin ChipP:

[5862] Random

Returns R (does not advance R).

[5863] ReplaceKey[n, X, Y, Z] Replaces K_(n) by S_(Kn)[R|X|C₃]⊕Y,advances R, and returns 1 only if S_(Kn)[X|Y|C₃]=Z. Otherwise returns 0.The time taken to calculate signatures and compare values must beidentical for all inputs.

[5864] And the following data and functions in ChipF:

[5865] CountRemaining Part of M₀ with contains the number of signaturesthat ChipF is allowed to generate. Decrements with each successful callto GetProgramKey. Permissions in P for this part of M₀ needs to beReadOnly once ChipF has been setup. Therefore can only be updated by aChipS that has authority to perform updates to that part of M₀.

[5866] K_(new) The new key to be transferred from ChipF to ChipP. Mustnot be visible.

[5867] SetPartialKey[X,Y] If word X of K_(new) has not yet been set, setword X of K_(new) to Y and return 1. Otherwise return 0. This functionallows K_(new) to be programmed in multiple steps, thereby allowingdifferent people or systems to know different parts of the key (but notthe whole K_(new)). K_(new) is stored in ChipF's flash memory. Sincethere is a small number of ChipFs, it is theoretically not necessary tostore the inverse of K_(new), but it is stronger protection to do so.

[5868] GetProgramKey[n,X] Advances R_(F), decrements CountRemaining,outputs R_(F), the encrypted key S_(Kn)[X|R_(F)|C₃]⊕K_(new) and asignature of the first two outputs plus C₃ if CountRemaining>0.Otherwise outputs 0. The time to calculate the encrypted key & signaturemust be identical for all inputs.

[5869] To update P's key:

[5870] a. System calls ChipP's Random function;

[5871] b. ChipP returns R_(P) to System;

[5872] c. System calls ChipF's GetProgramKey function, passing in n1(the desired key to use) and the result from b;

[5873] d. ChipF updates R_(F), then calculates and returns R_(F),S_(Kn1)[R_(P)|R_(F)|C₃]⊕K_(new), andS_(Kn1)[R_(F)|S_(Kn1)[R_(P)|R_(F)|C₃]⊕K_(new)|C₃];

[5874] e. If the response from d is not 0, System calls ChipP'sReplaceKey function, passing in n2 (the key to use in ChipP) and theresponse from d;

[5875] f. System checks response from ChipP. If the response is 1, thenK_(Pn2) has been correctly updated to K_(new). If the response is 0,K_(Pn2) has not been updated.

[5876] The choice of n1 and n2 must be such that ChipF's K_(n1)=ChipP'sK_(n2).

[5877] The data flow for key updates is shown in FIG. 345 below.

[5878] Note that K_(new) is never passed in the open. An attacker couldsend its own R_(P), but cannot produce S_(Kn1)[R_(P)|R_(F)|C₃] withoutK_(n1). The signature based on K_(new) is sent to ensure that ChipP willbe able to determine if either of the first two parameters have beenchanged en route.

[5879] CountRemaining needs to be setup in M_(F0) (including making itReadOnly in P) before ChipF is programmed with K_(P). ChipF shouldtherefore be programmed to only perform a limited number ofGetProgramKey operations (thereby limiting compromise exposure if aChipF is stolen). An authorized ChipS can be used to update this counterif neccesary (see Section 9.4 on page 673).

[5880] 10.5.1 Chicken and Egg

[5881] As with the single key protocol, for the Program Key protocol towork, both ChipF and ChipP must both know K_(old). Obviously both chipshad to be programmed with K_(old), and thus K_(old) can be thought of asan older K_(new):K_(old) can be placed in chips if another ChipF knowsK_(older), and so on.

[5882] Although this process allows a chain of reprogramming of keys,with each stage secure, at some stage the very first key (K_(first))must be placed in the chips. K_(first) is in fact programmed with thechip's microcode at the manufacturing test station as the last step inmanufacturing test. K_(first) can be a manufacturing batch key, changedfor each batch or for each customer etc, and can have as short a life asdesired. Compromising K_(first) need not result in a complete compromiseof the chain of Ks. Depending on reprogramming requirements, K_(first)can be the same or different for all K_(n).

[5883] 10.5.2 Security Note

[5884] Different ChipFs should have different R_(F) values to preventK_(new) from being determined as follows: The attacker needs 2 ChipFs,both with the same R_(F) and K_(n) but different values for K_(new). Byknowing K_(new1) the attacker can determine K_(new2). The size of R_(F)is 2¹⁶⁰, and assuming a lifespan of approximately 2³² R_(S), an attackerneeds about 2⁶⁰ ChipFs with the same K_(n) to locate the correct chIP.Given that there are likely to be only hundreds of ChipFs with the sameK_(n), this is not a likely attack. The attack can be eliminatedcompletely by making C₃ different per chip and transmitting it with thenew signature.

[5885] 11 Summary of Functions for All Protocols

[5886] All protocol sets, whether single key, multiple key, single M ormultiple M, all rely on the same set of functions. The function set islisted here:

[5887] 11.1 All Chips

[5888] Since every chip must act as ChipP, ChipA and potentially ChipU,all chips require the following functions:

[5889] Random

[5890] ReplaceKey

[5891] Read

[5892] Write

[5893] WriteA

[5894] SetPermissions

[5895] 11.2 ChipT

[5896] Chips that are to be used as ChipT also require:

[5897] Test

[5898] 11.3 ChipS

[5899] Chips that are to be used as ChipS also require either or bothof:

[5900] SignM

[5901] SignP

[5902] 11.4 ChipF

[5903] Chips that are to be used as ChipF also require:

[5904] SetPartialKey

[5905] GetProgramKey

[5906] 12 Remote Upgrades

[5907] 12.1 Basic Remote Upgrades

[5908] Regardless of the number of keys and the number of memoryvectors, the use of authenticated reads and writes, and of replacing anew key without revealing K_(new) or K_(old) allows the possibility ofremote upgrades of ChipU and ChipP. The upgrade typically involves aremote server and follows two basic steps:

[5909] a. During the first stage of the upgrade, the remote systemauthenticates the user's system to ensure the user's system has thesetup that it claims to have.

[5910] b. During the second stage of the upgrade, the user's systemauthenticates the remote system to ensure that the upgrade is from atrusted source.

[5911] 12.1.1 User Requests Upgrade

[5912] The user requests that he wants to upgrade. This can be done byrunning a specific upgrade application on the user's computer, or byvisiting a specific website.

[5913] 12.1.2 Remote System Gathers Info Securely About User's CurrentSetup

[5914] In this step, the remote system determines the current setup forthe user. The current setup must be authenticated, to ensure that theuser truly has the setup that is claimed. Traditionally, this has beenby checking the existence of files, generating checksums from thosefiles, or by getting a serial number from a hardware dongle, althoughthese traditional methods have difficulties since they can be generatedlocally by “hacked” software.

[5915] The authenticated read protocol described in Section 8.3 on page664 can be used to accomplish this step. The use of random numbers hasthe advantage that the local user cannot capture a successfultransaction and play it back on another computer system to fool theremote system.

[5916] 12.1.3 Remote System Gives User Choice of Upgrade Possibilities &User Chooses

[5917] If there is more than one upgrade possibility, the variousupgrade options are now presented to the user. The upgrade options couldvary based on a number of factors, including, but not limited to:

[5918] current user setup

[5919] user's preference for payment schemes (e.g. single payment vs.multiple payment)

[5920] number of other products owned by user

[5921] The user selects an appropriate upgrade and pays if necessary (bysome scheme such as via a secure web site). What is important to notehere is that the user chooses a specific upgrade and commences theupgrade operation.

[5922] 12.1.4 Remote System Sends Upgrade Request to Local System

[5923] The remote system now instructs the local system to perform theupgrade. However, the local system can only accept an upgrade from theremote system if the remote system is also authenticated. This iseffectively an authenticated write. The use of R_(U) in the signatureprevents the upgrade message from being replayed on another ChipU.

[5924] If multiple keys are used, and each chip has a unique key, theremote system can use a serial number obtained from the current setup(authenticated by a common key) to lookup the unique key for use in theupgrade. Although the random number provides time varying messages, useof an unknown K that is different for each chip means that collectionand examination of messages and their signatures is made even moredifficult.

[5925] 12.2 OEM Upgrades

[5926] OEM upgrades are effectively the same as remote upgrades, exceptthat the user interacts with an OEM server for upgrade selection. TheOEM server may send sub-requests to the manufacturer's remote server toprovide authentication, upgrade availability lists, and base-levelpricing information.

[5927] An additional level of authentication may be incorporated intothe protocol to ensure that upgrade requests are coming from the OEMserver, and not from a 3rd party. This can readily be incorporated intoboth authentication steps.

[5928] 13 Choice of Signature Function

[5929] Given that all protocols make use of keyed signature functions,the choice of function is examined here.

[5930] Table 232 outlines the attributes of the applicable choices (seeSection 5.2 on page 629 and Section 5.5 on page 636 for moreinformation). The attributes are phrased so that the attribute is seenas an advantage. TABLE 232 Attributes of Applicable Signature FunctionsHMAC- Triple Random HMAC- HMAC- RIPEM DES Blowfish RC5 IDEA SequencesMD5 SHA1 D160 Free of patents • • • • • • Random key • • • generationCan be exported • • • • from the USA Fast • • • • Preferred Key 168¹ 128128 128 512 128 160 160 Size (bits) or use in this application Blocksize (bits) 64 64 64 64 256 512 512 512 Cryptanalysis • • • • •Attack-Free (apart from weak keys) Output size given ≧N ≧N ≧N ≧N 128 128160 160 input size N Low storage • • • • requirements Low silicon • • •• complexity NSA designed • •

[5931] An examination of Table 232 shows that the choice is effectivelybetween the 3 HMAC constructs and the Random Sequence. The problem ofkey size and key generation eliminates the Random Sequence. Given that anumber of attacks have already been carried out on MD5 and since thehash result is only 128 bits, HMAC-MD5 is also eliminated. The choice istherefore between HMAC-SHA1 and HMAC-RIPEMD160. Of these, SHA-1 is thepreferred function, since:

[5932] SHA-1 has been more extensively cryptanalyzed without beingbroken;

[5933] SHA-1 requires slightly less intermediate storage thanRIPE-MD-160;

[5934] SHA-1 is algorithmically less complex than RIPE-MD-160;

[5935] Although SHA-1 is slightly faster than RIPE-MD-160, this was nota reason for choosing SHA-1.

[5936] 13.1 HMAC-SHA1

[5937] The mechanism for authentication is the HMAC-SHA1 algorithm. Thissection examines the HMAC-SHA1 algorithm in greater detail than coveredso far, and describes an optimization of the algorithm that requiresfewer memory resources than the original definition.

[5938] 13.1.1 HMAC

[5939] Given the following definitions:

[5940] H=the hash function (e.g. MD5 or SHA-1)

[5941] n=number of bits output from H (e.g. 160 for SHA-1, 128 bits forMD5)

[5942] M=the data to which the MAC function is to be applied

[5943] K=the secret key shared by the two parties

[5944] ipad=0x36 repeated 64 times

[5945] opad=0x5C repeated 64 times

[5946] The HMAC algorithm is as follows:

[5947] a. Extend K to 64 bytes by appending 0×00 bytes to the end of K

[5948] b. XOR the 64 byte string created in (1) with ipad

[5949] c. append data stream M to the 64 byte string created in (2)

[5950] d. Apply H to the stream generated in (3)

[5951] e. XOR the 64 byte string created in (1) with opad

[5952] f. Append the H result from (4) to the 64 byte string resultingfrom (5)

[5953] g. Apply H to the output of (6) and output the result

[5954] Thus:

HMAC[M]=H[(K ⊕ opad)|H[(K ⊕ ipad)|M]]

[5955] The HMAC-SHA1 algorithm is simply HMAC with H=SHA-1.

[5956] 13.1.2 SHA-1

[5957] The SHA1 hashing algorithm is described in the context of otherhashing algorithms in Section 5.5.3.3 on page 640, and completelydefined in [28]. The algorithm is summarized here.

[5958] Nine 32-bit constants are defined in Table 233. There are 5constants used to initialize the chaining variables, and there are 4additive constants. TABLE 233 Constants used in SHA-1 Initial ChainingValues Additive Constants h₁ 0x67452301 y₁ 0x5A827999 h₂ 0xEFCDAB89 y₂0x6ED9EBA1 h₃ 0x98BADCFE y₃ 0x8F1BBCDC h₄ 0x10325476 y₄ 0xCA62C1D6 h₅0xC3D2E1F0

[5959] Non-optimized SHA-1 requires a total of 2912 bits of datastorage:

[5960] Five 32-bit chaining variables are defined: H₁, H₂, H₃, H₄ andH₅.

[5961] Five 32-bit working variables are defined: A, B, C, D, and E.

[5962] One 32-bit temporary variable is defined: t.

[5963] Eighty 32-bit temporary registers are defined: X₀₋₇₉.

[5964] The following functions are defined for SHA-1: TABLE 234Functions used in SHA-1 Symbolic Nomenclature Description + Additionmodulo 2³² X

Y Result of rotating X left through Y bit positions f(X, Y, Z) (X

Y)

(

X

Z) g(X, Y, z) (X

Y)

(X

Z)

(Y

Z) h(X, Y, Z) X ⊕ Y ⊕ Z

[5965] The hashing algorithm consists of firstly padding the inputmessage to be a multiple of 512 bits and initializing the chainingvariables H₁₋₅ with h₁₋₅. The padded message is then processed in512-bit chunks, with the output hash value being the final 160-bit valuegiven by the concatenation of the chaining variables: H₁|H₂|H₃|H₄|H₅.

[5966] The steps of the SHA-1 algorithm are now examined in greaterdetail.

[5967] 13.1.2.1 Step 1. Preprocessing

[5968] The first step of SHA-1 is to pad the input message to be amultiple of 512 bits as follows and to initialize the chainingvariables. TABLE 235 Steps to follow to preprocess the input message Padthe Append a 1 bit to the message input message Append 0 bits such thatthe length of the padded message is 64-bits short of a multiple of 512bits. Append a 64-bit value containing the length in bits of theoriginal input message. Store the length as most significant bit throughto least significant bit. Initialize H₁

h₁, H₂

h₂, H₃

h₃, H₄

h₄, the chaining H₅

h₅ variables

[5969] 13.1.2.2 Step 2. Processing

[5970] The padded input message is processed in 512-bit blocks. Each512-bit block is in the form of 16 32-bit words, referred to asInputWord₀₋₁₅. TABLE 236 Steps to follow for each 512 bit block(InputWord₀₋₁₅) Copy the 512 For j = 0 to 15 input bits X_(j) =InputWord_(j) into X₀₋₁₅ Expand X₀₋₁₅ For j = 16 to 79 into X₁₆₋₇₉ X_(j)

((X_(j−3) ⊕ X_(j−8) ⊕ X_(j−14) ⊕ X_(j−16))

1) Initialize A

H₁, B

H₂, C

H₃, D

H₄, working E

H₅ variables Round 1 For j = 0 to 19 t

((A

5) + f(B, C, D) + E + Xj + y₁) E

D, D

C, C

(B

30), B

A, A

t Round 2 For j = 20 to 39 t

((A

5) + h(B, C, D) + E + Xj + y₂) E

D, D

C, C

(B

30), B

A, A

t Round 3 For j = 40 to 59 t

((A

5) + g(B, C, D) + E + Xj + y₃) E

D, D

C, C

(B

30), B

A, A

t Round 4 For j = 60 to 79 t

((A

5) + h(B, C, D) + E + Xj + y₄) E

D, D

C, C

(B

30), B

A, A

t Update H₁

H₁ + A, H₂

H₂ + B, chaining H₃

H₃ + C, H₄

H₄ + D, variables H₅

H₅ + E

[5971] The bold text is to emphasize the differences between each round.

[5972] 13.1.2.3 Step 3. Completion

[5973] After all the 512-bit blocks of the padded input message havebeen processed, the output hash value is the final 160-bit value givenby: H₁|H₂|H₃|H₄|H₅.

[5974] 13.1.2.4 Optimization for Hardware Implementation

[5975] The SHA-1 Step 2 procedure is not optimized for hardware. Inparticular, the 80 temporary 32-bit registers use up valuable silicon ona hardware implementation. This section describes an optimization to theSHA-1 algorithm that only uses 16 temporary registers. The reduction insilicon is from 2560 bits down to 512 bits, a saving of over 2000 bits.It may not be important in some applications, but in the QA Chip storagespace must be reduced where possible.

[5976] The optimization is based on the fact that although the original16-word message block is expanded into an 80-word message block, the 80words are not updated during the algorithm. In addition, the words relyon the previous 16 words only, and hence the expanded words can becalculated on-the-fly during processing, as long as we keep 16 words forthe backward references. We require rotating counters to keep track ofwhich register we are up to using, but the effect is to save a largeamount of storage.

[5977] Rather than index X by a single value j, we use a 5 bit counterto count through the iterations. This can be achieved by initializing a5-bit register with either 16 or 20, and decrementing it until itreaches 0. In order to update the 16 temporary variables as if they were80, we require 4 indexes, each a 4-bit register. All 4 indexes increment(with wraparound) during the course of the algorithm. TABLE 237Optimised Steps to follow for each 512 bit block (InputWord₀₋₁₅)Initialize A

H₁, B

H₂, C

H₃, D

H₄, E

H₅ working N₁

13, N₂

8, N₃

2, N₄

0 variables Round 0 Do 16 times Copy the 512 X_(N4) = InputWord_(N4)input bits [

N₁,

N₂,

N₃]_(optional)

N₄ into X₀₋₁₅ Round 1A Do 16 times t

((A

5) + f(B, C, D) + E + X_(N4) + y₁) [

N₁,

N₂,

N₃]_(optional)

N₄ E

D, D

C, C

(B

30), B

A, A

t Round 1B Do 4 times X_(N4)

((X_(N1) ⊕ X_(N2) ⊕ X_(N3) ⊕ X_(N4))

1) t

((A

5) + f(B, C, D) + E + X_(N4) + y₁)

N₁,

N₂,

N₃,

N₄ E

D, D

C, C

(B

30), B

A, A

t Round 2 Do 20 times X_(N4)

((X_(N1) ⊕ X_(N2) ⊕ X_(N3) ⊕ X_(N4))

1) t

((A

5) + h(B, C, D) + E + X_(N4) + y₂)

N₁,

N₂,

N₃,

N₄ E

D, D

C, C

(B

30), B

A, A

t Round 3 Do 20 times X_(N4)

((X_(N1) ⊕ X_(N2) ⊕ X_(N3) ⊕ X_(N4))

1) t

((A

5) + g(B, C, D) + E + X_(N4) + y₃)

N₁,

N₂,

N₃,

N₄ E

D, D

C, C

(B

30), B

A, A

t Round 4 Do 20 times X_(N4)

((X_(N1) ⊕ X_(N2) ⊕ X_(N3) ⊕ X_(N4))

1) t

((A

5) + h(B, C, D) + E + X_(N4) + y₄)

N₁,

N₂,

N₃,

N₄ E

D, D

C, C

(B

30), B

A, A

t Update H₁

H₁ + A, H₂

H₂ + B, chaining H₃

H₃ + C, H₄

H₄ + D, variables H₅

H₅ + E

[5978] The bold text is to emphasize the differences between each round.

[5979] The incrementing of N₁, N₂, and N₃ during Rounds 0 and 1A isoptional. A software implementation would not increment them, since ittakes time, and at the end of the 16 times through the loop, all 4counters will be their original values. Designers of hardware may wishto increment all 4 counters together to save on control logic.

[5980] Round 0 can be completely omitted if the caller loads the 512bits of X₀₋₁₅.

[5981] 14 Holding Out Against Attacks

[5982] The authentication protocols described in Section 7 on page 661onward should be resistant to defeat by logical means. This sectiondetails each type of attack in turn with reference to the ReadAuthentication protocol.

[5983] 14.1 Brute Force Attack

[5984] A brute force attack is guaranteed to break any protocol. Howeverthe length of the key means that the time for an attacker to perform abrute force attack is too long to be worth the effort.

[5985] An attacker only needs to break K to build a clone authenticationchIP. A brute force attack on K must therefore break a 160-bit key.

[5986] An attack against K requires a maximum of 2¹⁶⁰ attempts, with a50% chance of finding the key after only 2¹⁵⁹ attempts. Assuming anarray of a trillion processors, each running one million tests persecond, 2¹⁵⁹ (7.3×10⁴⁷) tests takes 2.3×10²² years, which is longer thanthe total lifetime of the universe. There are around 100 millionpersonal computers in the world. Even if these were all connected in anattack (e.g. via the Internet), this number is still 10,000 timessmaller than the trillion-processor attack described. Further, if themanufacture of one trillion processors becomes a possibility in the ageof nanocomputers, the time taken to obtain the key is still longer thanthe total lifetime of the universe.

[5987] 14.2 Guessing the Key Attack

[5988] It is theoretically possible that an attacker can simply “guessthe key”. In fact, given enough time, and trying every possible number,an attacker will obtain the key. This is identical to the brute forceattack described above, where 2¹⁵⁹ attempts must be made before a 50%chance of success is obtained.

[5989] The chances of someone simply guessing the key on the first tryis 2¹⁶⁰. For comparison, the chance of someone winning the top prize ina U.S. state lottery and being killed by lightning in the same day isonly 1 in 2⁶¹ (78]. The chance of someone guessing the authenticationchip key on the first go is 1 in 2¹⁶⁰, which is comparable to two peoplechoosing exactly the same atoms from a choice of all the atoms in theEarth i.e. extremely unlikely.

[5990] 14.3 Quantum Computer Attack

[5991] To break K, a quantum computer containing 160 qubits embedded inan appropriate algorithm must be built. As described in Section 5.7.1.7on page 648, an attack against a 160-bit key is not feasible. An outsideestimate of the possibility of quantum computers is that 50 qubits maybe achievable within 50 years. Even using a 50 qubit quantum computer,2¹¹⁰ tests are required to crack a 160 bit key. Assuming an array of 1billion 50 qubit quantum computers, each able to try 2⁵⁰ keys in 1microsecond (beyond the current wildest estimates) finding the key wouldtake an average of 18 billion years.

[5992] 14.4 Ciphertext Only Attack

[5993] An attacker can launch a ciphertext only attack on K bymonitoring calls to Random and Read.

[5994] However, given that all these calls also reveal the plaintext aswell as the hashed form of the plaintext, the attack would betransformed into a stronger form of attack—a known plaintext attack.

[5995] 14.5 Known Plaintext Attack

[5996] It is easy to connect a logic analyzer to the connection betweenthe System and the authentication chip, and thereby monitor the flow ofdata. This flow of data results in known plaintext and the hashed formof the plaintext, which can therefore be used to launch a knownplaintext attack against K.

[5997] To launch an attack against K, multiple calls to Random and Testmust be made (with the call to Test being successful, and thereforerequiring a call to Read on a valid chIP). This is straightforward,requiring the attacker to have both a system authentication chip and aconsumable authentication chIP. For each set of calls, an X, S_(K)[X]pair is revealed. The attacker must collect these pairs for furtheranalysis.

[5998] The question arises of how many pairs must be collected for ameaningful attack to be launched with this data. An example of an attackthat requires collection of data for statistical analysis isdifferential cryptanalysis (see Section 14.13 on page 703). However,there are no known attacks against SHA-1 or HMAC-SHA1 [7][7][7], sothere is no use for the collected data at this time.

[5999] 14.6 Chosen Plaintext Attacks

[6000] The golden rule for the QA Chip is that it never signs somethingthat is simply given to it—i.e. it never lets the user choose themessage that is signed.

[6001] Although the attacker can choose both R_(T) and possibly M, ChipAadvances its random number R_(A) with each call to Read. The resultantmessage X therefore contains 160 bits of changing data each call thatare not chosen by the attacker.

[6002] To launch a chosen text attack the attacker would need to locatea chip whose R was the desired R. This makes the search effectivelyimpossible.

[6003] 14.7 Adaptive Chosen Plaintext Attacks

[6004] The HMAC construct provides security against all forms of chosenplaintext attacks [7]. This is primarily because the HMAC construct has2 secret input variables (the result of the original hash, and thesecret key). Thus finding collisions in the hash function itself whenthe input variable is secret is even harder than finding collisions inthe plain hash function. This is because the former requires directaccess to SHA-1 in order to generate pairs of input/output from SHA-1.

[6005] Since R changes with each call to Read, the user cannot choosethe complete message. The only value that can be collected by anattacker is HMAC[R₁|R₂|M₂]. These are not attacks against the SHA-1 hashfunction itself, and reduce the attack to a differential cryptanalysisattack (see Section 14.13 on page 703), examining statisticaldifferences between collected data. Given that there is no differentialcryptanalysis attack known against SHA-1 or HMAC, the protocols areresistant to the adaptive chosen plaintext attacks.

[6006] 14.8 Purposeful Error Attack

[6007] An attacker can only launch a purposeful error attack on the Testfunction, since this is the only function in the Read protocol thatvalidates input against the keys.

[6008] With the Test function, a 0 value is produced if an error isfound in the input—no further information is given. In addition, thetime taken to produce the 0 result is independent of the input, givingthe attacker no information about which bit(s) were wrong.

[6009] A purposeful error attack is therefore fruitless.

[6010] 14.9 Chaining Attack

[6011] Any form of chaining attack assumes that the message to be hashedis over several blocks, or the input variables can somehow be set. TheHMAC-SHA1 algorithm used by Protocol C1 only ever hashes one or two512-bit blocks. Chaining attacks are not possible when only one block isused, and are extremely limited when two blocks are used.

[6012] 14.10 Birthday Attack

[6013] The strongest attack known against HMAC is the birthday attack,based on the frequency of collisions for the hash function [7][7].However this is totally impractical for minimally reasonable hashfunctions such as SHA-1. And the birthday attack is only possible whenthe attacker has control over the message that is hashed.

[6014] Since in the protocols described for the QA Chip, the message tobe signed is never chosen by the attacker (at least one 160-bit R valueis chosen by the chip doing the signing), the attacker has no controlover the message that is hashed. An attacker must instead search for acollision message that hashes to the same value (analogous to findingone person who shares your birthday).

[6015] The clone chip must therefore attempt to find a new value R₂ suchthat the hash of R₁, R₂ and a chosen M₂ yields the same hash value asH[R₁|R₂|M]. However ChipT does not reveal the correct hash value (theTest function only returns 1 or 0 depending on whether the hash value iscorrect).

[6016] Therefore the only way of finding out the correct hash value (inorder to find a collision) is to interrogate a real ChipA. But to findthe correct value means to update M, and since the decrement-only partsof M are one-way, and the read-only parts of M cannot be changed, aclone consumable would have to update a real consumable beforeattempting to find a collision. The alternative is a brute force attacksearch on the Test function to find a success (requiring each cloneconsumable to have access to a System consumable). A brute force search,as described above, takes longer than the lifetime of the universe, inthis case, per authentication.

[6017] There is no point for a clone consumable to launch this kind ofattack.

[6018] 14.11 Substitution with a Complete Lookup Table

[6019] The random number seed in each System is 160 bits. The best casesituation for an attacker is that no state data has been changed.Assuming also that the clone consumable does not advance its R, there isa constant value returned as M. A clone chip must therefore returnS_(K)[R|c] (where c is a 10 constant), which is a 160 bit value.

[6020] Assuming a 160-bit lookup of a 160-bit result, this requires2.9×10⁴⁹ bytes, or 2.6×10³⁷ terabytes, certainly more space than isfeasible for the near future. This of course does not even take intoaccount the method of collecting the values for the ROM. A completelookup table is therefore completely impossible.

[6021] 14.12 Substitution with a Sparse Lookup Table

[6022] A sparse lookup table is only feasible if the messages sent tothe authentication chip are somehow predictable, rather than effectivelyrandom.

[6023] The random number R is seeded with an unknown random number,gathered from a naturally System authentication chip's Random function,and iterating some random event. There is no possibility for a clonemanufacturer to know what the possible range of R is for all Systems,since each bit has an unrelated chance of being 1 or 0.

[6024] Since the range of R in all systems is unknown, it is notpossible to build a sparse lookup table that can be used in all systems.The general sparse lookup table is therefore not a possible attack.

[6025] However, it is possible for a clone manufacturer to know what therange of R is for a given System. This can be accomplished by loading aLFSR with the current result from a call to a specific number of timesinto the future. If this is done, a special ROM can be built which willonly contain the responses for that particular range of R, i.e. a ROMspecifically for the consumables of that particular System. But theattacker still needs to place correct information in the ROM. Theattacker will therefore need to find a valid authentication chip andcall it for each of the values in R. Suppose the clone authenticationchip reports a full consumable, and then allows a single use beforesimulating loss of connection and insertion of a new full consumable.The clone consumable would therefore need to contain responses forauthentication of a full consumable and authentication of a partiallyused consumable. The worst case ROM contains entries for full andpartially used consumables for R over the lifetime of System. However, avalid authentication chip must be used to generate the information, andbe partially used in the process. If a given System only produces nR-values, the sparse lookup-ROM required is 20n bytes (20=160/8)multiplied by the number of different values for M. The time taken tobuild the ROM depends on the amount of time enforced between calls toRead.

[6026] After all this, the clone manufacturer must rely on the consumerreturning for a refill, since the cost of building the ROM in the firstplace consumes a single consumable. The clone manufacturer's business insuch a situation is consequently in the refills.

[6027] The time and cost then, depends on the size of R and the numberof different values for M that must be incorporated in the lookup. Inaddition, a custom clone consumable ROM must be built to match each andevery System, and a different valid authentication chip must be used foreach System (in order to provide the full and partially used data). Theuse of an authentication chip in a System must therefore be examined todetermine whether or not this kind of attack is worthwhile for a clonemanufacturer.

[6028] As an example, of a camera system that has about 10,000 prints inits lifetime. Assume it has a single Decrement Only value (number ofprints remaining), and a delay of 1 second between calls to Read. Insuch a system, the sparse table will take about 3 hours to build, andconsumes 100K. Remember that the construction of the ROM requires theconsumption of a valid authentication chip, so any money charged must beworth more than a single consumable and the clone consumable combined.Thus it is not cost effective to perform this function for a singleconsumable (unless the clone consumable somehow contained the equivalentof multiple authentic consumables).

[6029] If a clone manufacturer is going to go to the trouble of buildinga custom ROM for each owner of a System, an easier approach would be toupdate System to completely ignore the authentication chIP.

[6030] Consequently, this attack is possible as a per-System attack, anda decision must be made about the chance of this occurring for a givenSystem/Consumable combination. The chance will depend on the cost of theconsumable and authentication chips, the longevity of the consumable,the profit margin on the consumable, the time taken to generate the ROM,the size of the resultant ROM, and whether customers will come back tothe clone manufacturer for refills that use the same clone chip etc.

[6031] 14.13 Differential Cryptanalysis

[6032] Existing differential attacks are heavily dependent on thestructure of S boxes, as used in DES and other similar algorithms.Although HMAC-SHA1 has no S boxes, an attacker can undertake adifferential-like attack by undertaking statistical analysis of:

[6033] Minimal-difference inputs, and their corresponding outputs

[6034] Minimal-difference outputs, and their corresponding inputs

[6035] To launch an attack of this nature, sets of input/output pairsmust be collected. The collection can be via known plaintext, or from apartially adaptive chosen plaintext attack. Obviously the latter, beingchosen, will be more useful.

[6036] Hashing algorithms in general are designed to be resistant todifferential analysis. SHA-1 in particular has been specificallystrengthened, especially by the 80 word expansion so that minimaldifferences in input will still produce outputs that vary in a largernumber of bit positions (compared to 128 bit hash functions). Inaddition, the information collected is not a direct SHA-1 input/outputset, due to the nature of the HMAC algorithm. The HMAC algorithm hashesa known value with an unknown value (the key), and the result of thishash is then rehashed with a separate unknown value. Since the attackerdoes not know the secret value, nor the result of the first hash, theinputs and outputs from SHA-1 are not known, making any differentialattack extremely difficult.

[6037] There are no known differential attacks against SHA-1 orHMAC-SHA-1 [56][56].

[6038] The following is a more detailed discussion of minimallydifferent inputs and outputs from the QA ChIP.

[6039] 14.13.1 Minimal Difference Inputs

[6040] This is where an attacker takes a set of X, S_(K)[X] values wherethe X values are minimally different, and examines the statisticaldifferences between the outputs S_(K)[X]. The attack relies on X valuesthat only differ by a minimal number of bits. The question then arisesas to how to obtain minimally different X values in order to compare theS_(K)[X] values.

[6041] Although the attacker can choose both R_(T) and possibly M, ChipAadvances its random number R_(A) with each call to Read. The resultant Xtherefore contains 160 bits of changing data each call, and is thereforenot minimally different.

[6042] 14.13.2 Minimal Difference Outputs

[6043] This is where an attacker takes a set of X, S_(K)[X] values wherethe S_(K)[X] values are minimally different, and examines thestatistical differences between the X values. The attack relies onS_(K)[X] values that only differ by a minimal number of bits.

[6044] There is no way for an attacker to generate an X value for agiven S_(K)[X]. To do so would violate the fact that S is a one-wayfunction (HMAC-SHA1). Consequently the only way for an attacker to mountan attack of this nature is to record all observed X, S_(K)[X] pairs ina table. A search must then be made through the observed values forenough minimally different S_(K)[X] values to undertake a statisticalanalysis of the X values.

[6045] 14.14 Message Substitution Attacks

[6046] In order for this kind of attack to be carried out, a cloneconsumable must contain a real authentication chip, but one that iseffectively reusable since it never gets decremented. The cloneauthentication chip would intercept messages, and substitute its own.However this attack does not give success to the attacker.

[6047] A clone authentication chip may choose not to pass on a Writecommand to the real authentication chIP. However the subsequent Readcommand must return the correct response (as if the Write hadsucceeded). To return the correct response, the hash value must be knownfor the specific R and M. An attacker can only determine the hash valueby actually updating M in a real Chip, which the attacker does not wantto do. Even changing the R sent by System does not help since the Systemauthentication chip must match the R during a subsequent Test.

[6048] A message substitution attack would therefore be unsuccessful.This is only true if System updates the amount of consumable remainingbefore it is used.

[6049] 14.15 Reverse Engineering the Key Generator

[6050] If a pseudo-random number generator is used to generate keys,there is the potential for a clone manufacture to obtain the generatorprogram or to deduce the random seed used. This was the way in which thesecurity layer of the Netscape browser was initially broken [33].

[6051] 14.16 Bypassing the Authentication Process

[6052] The System should ideally update the consumable state data beforethe consumable is used, and follow every write by a read (toauthenticate the write). Thus each use of the consumable requires anauthentication. If the System adheres to these two simple rules, a clonemanufacturer will have to simulate authentication via a method above(such as sparse ROM lookup).

[6053] 14.17 Reuse of Authentication Chips

[6054] Each use of the consumable requires an authentication. If aconsumable has been used up, then its authentication chip will have hadthe appropriate state-data values decremented to 0. The chip cantherefore not be used in another consumable.

[6055] Note that this only holds true for authentication chips that holdDecrement-Only data items. If there is no state data decremented witheach usage, there is nothing stopping the reuse of the chIP. This is thebasic difference between Presence-Only authentication and ConsumableLifetime authentication. All described protocols allow both.

[6056] The bottom line is that if a consumable has Decrement Only dataitems that are used by the System, the authentication chip cannot bereused without being completely reprogrammed by a valid programmingstation that has knowledge of the secret key (e.g. an authorized refillstation).

[6057] 14.18 Management Decision to Omit Authentication to Save Costs

[6058] Although not strictly an external attack, a decision to omitauthentication in future Systems in order to save costs will have widelyvarying effects on different markets.

[6059] In the case of high volume consumables, it is essential toremember that it is very difficult to introduce authentication after themarket has started, as systems requiring authenticated consumables willnot work with older consumables still in circulation. Likewise, it isimpractical to discontinue authentication at any stage, as older Systemswill not work with the new, unauthenticated, consumables. In the secondcase, older Systems can be individually altered by replacing the Systemprogram code.

[6060] Without any form of protection, illegal cloning of high volumeconsumables is almost certain. However, with the patent and copyrightprotection, the probability of illegal cloning may be, say 50%. However,this is not the only loss possible. If a clone manufacturer were tointroduce clone consumables which caused damage to the System (e.g.clogged nozzles in a printer due to poor quality ink), then the loss inmarket acceptance, and the expense of warranty repairs, may besignificant.

[6061] In the case of a specialized pairing, such as a car/car-keys, ordoor/door-key, or some other similar situation, the omission ofauthentication in future systems is trivial and without repercussions.This is because the consumer is sold the entire set of System andConsumable authentication chips at the one time.

[6062] 14.19 Garrote/Bribe Attack

[6063] If humans do not know the key, there is no amount of force orbribery that can reveal them. The use of ChipF and the ReplaceKeyprotocol is specifically designed to avoid the requirement of theprogramming station having to know the new key. However ChipF must betold the new key at some stage, and therefore it is the person(s) whoenter the new key into ChipF that are at risk.

[6064] The level of security against this kind of attack is ultimately adecision for the System/Consumable owner, to be made according to thedesired level of service.

[6065] For example, a car company may wish to keep a record of all keysmanufactured, so that a person can request a new key to be made fortheir car. However this allows the potential compromise of the entirekey database, allowing an attacker to make keys for any of themanufacturer's existing cars. It does not allow an attacker to make keysfor any new cars. Of course, the key database itself may also beencrypted with a further key that requires a certain number of people tocombine their key portions together for access. If no record is kept ofwhich key is used in a particular car, there is no way to makeadditional keys should one become lost. Thus an owner will have toreplace his car's authentication chip and all his car-keys. This is notnecessarily a bad situation.

[6066] By contrast, in a consumable such as a printer ink cartridge, theone key combination is used for all Systems and all consumables.Certainly if no backup of the keys is kept, there is no human withknowledge of the key, and therefore no attack is possible. However, ano-backup situation is not desirable for a consumable such as inkcartridges, since if the key is lost no more consumables can be made.The manufacturer should therefore keep a backup of the key informationin several parts, where a certain number of people must together combinetheir portions to reveal the full key information. This may be requiredif case the chip programming station needs to be reloaded.

[6067] In any case, none of these attacks are against the authenticatedread protocol, since no humans are involved in the authenticationprocess.

[6068] Logical Interface

[6069] 15 Introduction

[6070] The QA Chip has a physical and a logical external interface. Thephysical interface defines how the QA Chip can be connected to aphysical System, while the logical interface determines how that Systemcan communicate with the QA ChIP. This section deals with the logicalinterface.

[6071] 15.1 Operating Modes

[6072] The QA Chip has four operating modes—Idle Mode, Program Mode,Trim Mode and Active Mode.

[6073] Idle Mode is used to allow the chip to wait for the nextinstruction from the System.

[6074] Trim Mode is used to determine the clock speed of the chip and totrim the frequency during the initial programming stage of the chip(when Flash memory is garbage). The clock frequency must be trimmed viaTrim Mode before Program Mode is used to store the program code.

[6075] Program Mode is used to load up the operating program code, andis required because the operating program code is stored in Flash memoryinstead of ROM (for security reasons).

[6076] Active Mode is used to execute the specific authenticationcommand specified by the System. Program code is executed in ActiveMode. When the results of the command have been returned to the System,the chip enters Idle Mode to wait for the next instruction.

[6077] 15.1.1 Idle Mode

[6078] The QA Chip starts up in Idle Mode. When the Chip is in IdleMode, it waits for a command from the master by watching the primary idon the serial line.

[6079] If the primary id matches the global id (0x00, common to all QAChips), and the following byte from the master is the Trim Mode id byte,the QA Chip enters Trim Mode and starts counting the number of internalclock cycles until the next byte is received.

[6080] If the primary id matches the global id (0x00, common to all QAChips), and the following byte from the master is the Program Mode idbyte, the QA Chip enters Program Mode.

[6081] If the primary id matches the global id (0x00, common to all QAChips), and the following byte from the master is the Active Mode idbyte, the QA Chip enters Active Mode and executes startup code, allowingthe chip to set itself into a state to receive authentication commands(includes setting a local ID).

[6082] If the primary id matches the chip's local ID, and the followingbyte is a valid command code, the QA Chip enters Active Mode, allowingthe command to be executed.

[6083] The valid 8-bit serial mode values sent after a global id are asshown in Table 238. They are specified to minimize the chances of themoccurring by error after a global id (e.g. 0xFF and 0x00 are not used):TABLE 238 Id byte values to place chip in specific mode ValueInterpretation 10100101 (0xA5) Trim Mode 10001110 (0x8E) Program Mode01111000 (0x78) Active Mode

[6084] 15.1.2 Trim Mode

[6085] Trim Mode is enabled by sending a global id byte (0x00) followedby the Trim Mode command byte. The purpose of Trim Mode is to set thetrim value (an internal register setting) of the internal ringoscillator so that Flash erasures and writes are of the correctduration. This is necessary due to the variation of the clock speed dueto process variations. If writes an erasures are too long, the Flashmemory will wear out faster than desired, and in some cases can even bedamaged.

[6086] Trim Mode works by measuring the number of system clock cyclesthat occur inside the chip from the receipt of the Trim Mode commandbyte until the receipt of a data byte. When the data byte is received,the data byte is copied to the trim register and the current value ofthe count is transmitted to the outside world.

[6087] Once the count has been transmitted, the QA Chip returns to IdleMode.

[6088] At reset, the internal trim register setting is set to a knownvalue r. The external user can now perform the following operations:

[6089] send the global id+write followed by the Trim Mode command byte

[6090] send the 8-bit value v over a specified time t

[6091] send a stop bit to signify no more data

[6092] send the global id+read followed by the Trim Mode command byte

[6093] receive the count c

[6094] send a stop bit to signify no more data

[6095] At the end of this procedure, the trim register will be v, andthe external user will know the relationship between external time t andinternal time c. Therefore a new value for v can be calculated.

[6096] The Trim Mode procedure can be repeated a number of times,varying both t and v in known ways, measuring the resultant c. At theend of the process, the final value for v is established (and stored inthe trim register for subsequent use in Program Mode). This value v mustalso be written to the flash for later use (every time the chip isplaced in Active Mode for the first time after power-up).

[6097] 15.1.3 Program Mode

[6098] Program Mode is enabled by sending a global id byte (0x00)followed by the Program Mode command byte.

[6099] The QA Chip determines whether or not the internal fuse has beenblown (by reading 32-bit word 0 of the information block of flashmemory).

[6100] If the fuse has been blown the Program Mode command is ignored,and the QA Chip returns to Idle Mode.

[6101] If the fuse is still intact, the chip enters Program Mode anderases the entire contents of Flash memory. The QA Chip then validatesthe erasure. If the erasure was successful, the QA Chip receives up to4096 bytes of data corresponding to the new program code and variabledata. The bytes are transferred in order byte₀ to byte₄₀₉₅.

[6102] Once all bytes of data have been loaded into Flash, the QA Chipreturns to Idle Mode.

[6103] Note that Trim Mode functionality must be performed before a chipenters Program Mode for the first time.

[6104] Once the desired number of bytes have been downloaded in ProgramMode, the LSS Master must wait for 80 μs (the time taken to write twobytes to flash at nybble rates) before sending the new transaction (egActive Mode). Otherwise the last nybbles may not be written to flash.

[6105] 15.1.4 Active Mode

[6106] Active Mode is entered either by receiving a global id byte(0x00) followed by the Active Mode command byte, or by sending a localid byte followed by a command opcode byte and an appropriate number ofdata bytes representing the required input parameters for that opcode.

[6107] In both cases, Active Mode causes execution of program codepreviously stored in the flash memory via Program Mode. As a result, wenever enter Active Mode after Trim Mode, without a Program Mode inbetween. However once programmed via Program Mode, a chip is allowed toenter Active Mode after power-up, since valid data will be in flash.

[6108] If Active Mode is entered by the global id mechanism, the QA Chipexecutes specific reset startup code, typically setting up the local idand other IO specific data.

[6109] If Active Mode is entered by the local id mechanism, the QA Chipexecutes specific code depending on the following byte, which functionsas an opcode. The opcode command byte format is shown in Table 239:TABLE 239 Command byte bits Description 2-0 Opcode 5-3

opcode 7-6 count of number of bits set in opcode (0 to 3)

[6110] The interpretation of the 3-bit opcode is shown in Table 240:TABLE 240 QA Chip opcodes Op² Mn³ Description 000 RST Reset 001 RNDRandom 010 RDM Read M 011 TST Test 100 WRM Write M with noauthentication 101 WRA Write with Authentication (to M, P, or K) 110chip specific - reserved for ChipF, ChipS etc 111 chip specific -reserved for ChipF, ChipS etc

[6111] The command byte is designed to ensure that errors intransmission are detected. Regular QA Chip commands are thereforecomprised of an opcode plus any associated parameters. The commands arelisted in Table 241: TABLE 241 QA Chip commands Input Output Commandopcode Additional parms Return value Reset RST — — Random RND — [20]Read RDM [1, 1, 20] [20, 64, 20]⁴ Test TST [1, 20, 64, 20] 89⁵ ifsuccessful, 76 if not Write WRM [1, 64, 20] 89 if successful, 76 if notWriteAuth WRA 76 [20, 64, 20] 89 if successful, 76 if not ReplaceKey WRA89 76 [1, 20, 20, 20] 89 if successful, 76 if not SetPermissions WRA 8989 [1, 1, 20, 4, 20] [4] SignM⁶ ChipS [1, 20, 20, 64, 20, 64] [20, 64,20] only SignP⁷ ChipS [1, 20, 20, 4, 20, 4] [20, 64, 20] only GetProgKeyChipF [1, 20] [20, 20, 20] only SetPartialKey ChipF [1, 4] 89 if onlysuccessful, 76 if not

[6112] Apart from the Reset command, the next four commands are thecommands most likely to be used during regular operation. The next threecommands are used to provide authenticated writes (which are expected tobe uncommon). The final set of commands (including SignM), are expectedto be specially implemented on ChipS and ChipF QA Chips only.

[6113] The input parameters are sent in the specified order, with eachparameter being sent least significant byte first and most significantbyte last.

[6114] Return (output) values are read in the same way—least significantbyte first and most significant byte last. The client must know how manybytes to retrieve. The QA Chip will time out and return to Idle Mode ifan incorrect number of bytes is provided or read.

[6115] In most cases, the output bytes from one chip's command (thereturn values) can be fed directly as the input bytes to another chip'scommand. An example of this is the RND and RD commands. The output datafrom a call to RND on a trusted QA Chip does not have to be kept by theSystem. Instead, the System can transfer the output bytes directly tothe input of the non-trusted QA Chip's RD command. The description ofeach command points out where this is so.

[6116] Each of the commands is examined in detail in the subsequentsections. Note that some algorithms are specifically designed becauseflash memory is assumed for the implementation of non-volatilevariables.

[6117] 15.1.5 Non Volatile Variables

[6118] The memory within the QA Chip contains some non-volatile (Flash)memory to store the variables required by the authentication protocol.Table 242 summarizes the variables. TABLE 242 Non volatile variablesrequired by the authentication protocol Size Name (bits) Description N 8Number of keys known to the chip T 8 Number of vectors M is broken intoK_(n) 160 per key, Array of N secret keys used for R_(K) 160 for R_(K)calculating F_(Kn)[X] where K_(n) is the nth element of the array. EachK_(n) must not be stored directly in the QA Chip. Instead, each chipneeds to store a single random number R_(K) (different for each chip),K_(n)⊕R_(K), and

K_(n)⊕R_(K). The stored K_(n)⊕R_(K) can be XORed with R_(K) to obtainthe real K_(n). Although

K_(n)⊕R_(K) must be stored to protect against differential attacks, itis not used. R 160 Current random number used to ensure time varyingmessages. Each chip instance must be seeded with a different initialvalue. Changes for each signature generation. M_(T) 512 per M Array of Tmemory vectors. Only M₀ can be written to with an authorized write,while all Ms can be written to in an unauthorized write. Writes to M₀are optimized for Flash usage, while updates to any other M_(n) areexpensive with regards to Flash utilization, and are expected to be onlyperformed once per section of M_(n). M₁contains T and N in ReadOnly formso users of the chip can know these two values. P_(T+N) 32 per P T + Nelement array of access permissions for each part of M. Entries n = {0 .. . T − 1} hold access permissions for non- authenticated writes toM_(n) (no key required). Entries n = {T to T + N − 1} hold accesspermissions for authenticated writes to M₀ for K_(n). Permission choicesfor each part of M are Read Only, Read/Write, and Decrement OnlyMinTicks 32 The minimum number of clock ticks between calls to key-basedfunctions.

[6119] Note that since these variables are in Flash memory, writesshould be minimized. The it is not a simple matter to write a new valueto replace the old. Care must be taken with flash endurance, and speedof access. This has an effect on the algorithms used to change Flashmemory based registers. For example, Flash memory should not be used asa shift register.

[6120] A reset of the QA Chip has no effect on the non-volatilevariables.

[6121] 15.1.5.1 M and P

[6122] M_(n) contains application specific state data, such as serialnumbers, batch numbers, and amount of consumable remaining. M_(n) can beread using the Read command and written to via the Write and WriteAcommands.

[6123] M₀ is expected to be updated frequently, while each part ofM_(1-n) should only be written to once. Only M₀ can be written to viathe WriteA command.

[6124] M₁ contains the operating parameters of the chip as shown inTable 243, and M_(2-n), are application specific. TABLE 243Interpretation of M₁ Length Bits interpretation 8 7-0 Number ofavailable keys 8 15-8  Number of available M vectors 16 31-16 Revisionof chip 96 127-32  Manufacture id information 128 255-128 Serial number8 263-256 Local id of chip 248 511-264 reserved

[6125] Each M_(n) is 512 bits in length, and is interpreted as a set of16×32-bit words. Although M_(n) may contain a number of differentelements, each 32-bit word differs only in write permissions. Each32-bit word can always be read. Once in client memory, the 512 bits canbe interpreted in any way chosen by the client. The different writepermissions for each P are outlined in Table 244: TABLE 244 Writepermissions Data type permission description Read Only Can never bewritten to ReadWrite Can always be written to Decrement Can only bewritten to if the new value is less Only than the old value. DecrementOnly values can be any multiple of 32 bits.

[6126] To accomplish the protection required for writing, a 2-bitpermission value P is defined for each of the 32-bit words. Table 245defines the interpretation of the 2-bit permission bit-pattern: TABLE245 Permission bit interpretation Bits Op Interpretation Action takenduring Write command 00 RW ReadWrite The new 32-bit value is alwayswritten to M[n]. 01 MSR Decrement The new 32-bit value is only Only(Most written to M[n] if it is Significant less than the value currentlyRegion) in M[n]. This is used for access to the Most Significant 16 bitsof a Decrement Only number. 10 NMSR Decrement The new 32-bit value isonly Only (Not written to M[n] if M[n − 1] the Most could also bewritten. The Significant NMSR access mode allows multiple Region)precision values of 32 bits and more (multiples of 32 bits) todecrement. 11 RO Read Only The new 32-bit value is ignored. M[n] is leftunchanged.

[6127] The 16 sets of permission bits for each 512 bits of M aregathered together in a single 32-bit variable P, where bits 2n and 2n+1of P correspond to word n of M as follows:

[6128] Each 2-bit value is stored as a pair with the msb in bit 1, andthe lsb in bit 0. Consequently, if words 0 to 5 of M had permission MSR,with words 6-15 of M permission RO, the 32-bit P variable would be0xFFFFF555:

[6129] 11-11-11-11-11-11-11-11-11-11-01-01-01-01-01-01

[6130] During execution of a Write and WriteA command, the appropriatePermissions[n] is examined for each M[n] starting from n=15 (msw of M)to n=0 (lsw of M), and a decision made as to whether the ne M[n] valuewill replace the old. Note that it is important to process the M[n] frommsw to lsw to correctly interpret the access permissions.

[6131] Permissions are set and read using the QA Chip's SetPermissionscommand. The default for P is all 0s (RW) with the exception of certainparts of M₁.

[6132] Note that the Decrement Only comparison is unsigned, so anyDecrement Only values that require negative ranges must be shifted intoa positive range. For example, a consumable with a Decrement Only dataitem range of −50 to 50 must have the range shifted to be 0 to 100. TheSystem must then interpret the range 0 to 100 as being −50 to 50. Notethat most instances of Decrement Only ranges are N to 0, so there is norange shift required.

[6133] For Decrement Only data items, arrange the data in order frommost significant to least significant 32-bit quantities from M[n]onward. The access mode for the most significant 32 bits (stored inM[n]) should be set to MSR. The remaining 32-bit entries for the datashould have their permissions set to NMSR.

[6134] If erroneously set to NMSR, with no associated MSR region, eachNMSR region will be considered independently instead of being amulti-precision comparison.

[6135] Examples of allocating M and Permission bits can be found in[86].

[6136] 15.1.5.2 K and R_(K)

[6137] K is the 160-bit secret key used to protect M and to ensure thatthe contents of M are valid (when M is read from a non trusted chIP). Kis initially programmed after manufacture, and from that point on, K canonly be updated to a new value if the old K is known. Since K must bekept secret, there is no command to directly read it.

[6138] K is used in the keyed one-way hash function HMAC-SHA1. As suchit should be programmed with a physically generated random number,gathered from a physically random phenomenon. K must NOT be generatedwith a computer-run random number generator. The security of the QAChips depends on K being generated in a way that is not deterministic.

[6139] Each K_(n) must not be stored directly in the QA ChIP. Instead,each chip needs to store a single random number R_(K) (different foreach chIP), K_(n)⊕R_(K), and

K_(n)⊕R_(K). The stored K_(n)⊕R_(K) can be XORed with R_(K) to obtainthe real K_(n). Although

K_(n)R_(K) must be stored to protect against differential attacks, it isnot used.

[6140] 15.1.5.3 R

[6141] R is a 160-bit random number seed that is set up aftermanufacture (when the chip is programmed) and from that point on, cannotbe changed. R is used to ensure that each signed item contains timevarying information (not chosen by an attacker), and each chip's R isunrelated from one chip to the next.

[6142] R is used during the Test command to ensure that the R from theprevious call to Random was used as the session key in generating thesignature during Read. Likewise, R is used during the WriteAuth commandto ensure that the R from the previous call to Read was used as thesession key during generation of the signature in the remoteAuthenticated chIP.

[6143] The only invalid value for R is 0. This is because R is changedvia a 160-bit maximal period LFSR (Linear Feedback Shift Register) withtaps on bits 0, 2, 3, and 5, and is changed only by a successful call toa signature generating function (e.g. Test, WriteAuth).

[6144] The logical security of the QA Chip relies not only upon therandomness of K and the strength of the HMAC-SHA1 algorithm. To preventan attacker from building a sparse lookup table, the security of the QAChip also depends on the range of R over the lifetime of all Systems.What this means is that an attacker must not be able to deduce whatvalues of R there are in produced and future Systems. Ideally, R shouldbe programmed with a physically generated random number, gathered from aphysically random phenomenon (must not be deterministic). R must NOT begenerated with a computer-run random number generator.

[6145] 15.1.5.4 MinTicks

[6146] There are two mechanisms for preventing an attacker fromgenerating multiple calls to key-based functions in a short period oftime. The first is an internal ring oscillator that istemperature-filtered. The second mechanism is the 32-bit MinTicksvariable, which is used to specify the minimum number of QA Chip clockticks that must elapse between calls to key-based functions.

[6147] The MinTicks variable is set to a fixed value when the QA Chip isprogrammed. It could possibly be stored in M₁.

[6148] The effective value of MinTicks depends on the operating clockspeed and the notion of what constitutes a reasonable time betweenkey-based function calls (application specific). The duration of asingle tick depends on the operating clock speed. This is the fastestspeed of the ring oscillator generated clock (i.e. at the lowest validoperating temperature).

[6149] Once the duration of a tick is known, the MinTicks value can tobe set. The value for MinTicks will be the minimum number of ticksrequired to pass between calls to the key-based functions (there is noneed to protect Random as this produces the same output each time it iscalled multiple times in a row). The value is a real-time number, anddivided by the length of an operating tick.

[6150] It should be noted that the MinTicks variable only slows down anattacker and causes the attack to cost more since it does not stop anattacker using multiple System chips in parallel.

[6151] 15.1.6 GetProgramKey

[6152] Input: n, R_(E)=[1 byte, 20 bytes]

[6153] Output: R_(L), E_(Kx)[S_(Kn)[R_(E)|R_(L)|C₃]],S_(Kx)[R_(L)|E_(Kx)[S_(Kn)[R_(E)|R_(L)|C₃]|C₃]=[20, 20, 20]

[6154] Changes: R_(L)

[6155] Note: The GetProgramKey command is only implemented in ChipF, andnot in all QA Chips. The GetProgramKey command is used to produce thebytestream required for updating a specified key in ChipP. Only an QAChip programmed with the correct values of the old K_(n) can respondcorrectly to the GetProgramKey request. The output bytestream from theRandom command can be fed as the input bytestream to the ReplaceKeycommand on the QA Chip being programmed (ChipP).

[6156] The input bytestream consists of the appropriate opcode followedby the desired key to generate the signature, followed by 20 bytes ofR_(E) (representing the random number read in from ChipP).

[6157] The local random number R_(L) is advanced, and signed incombination with R_(E) and C₃ by the chosen key to generate a timevarying secret number known to both ChipF and ChipP. This signature isthen XORed with the new key K_(x) (this encrypts the new key). The firsttwo output parameters are signed with the old key to ensure that ChipPknows it decoded K_(x) correctly.

[6158] This whole procedure should only be allowed a given number oftimes. The actual number can conveniently be stored in the local M₀[0](eg word 0 of M₀) with ReadOnly permission. Of course another chip couldperform an Authorised write to update the number (via a ChipS) should itbe desired.

[6159] The GetProgramKey command is implemented by the following steps:Loop through all of Flash, reading each word (will trigger checks)Accept n Restrict n to N Accept R_(E) If (M₀[0] = 0) Output 60 bytes of0x00 # no more keys allowed to be generated from this chipF Done EndIfAdvance R_(L) SIG

S_(Kn)[R_(L)|R_(E)|C₃] # calculation must take constant time Tmp

SIG ⊕ K_(X) Output R_(L) Output Tmp Decrement M₀[0] #  reduce the numberof allowable key generations by 1 SIG

S_(KX)[R_(L)|Tmp|C₃]  # calculation must take constant time Output SIG

[6160] 15.1.7 Random

[6161] Input: None

[6162] Output: R_(L)=[20 bytes]

[6163] Changes: None

[6164] The Random command is used by a client to obtain an input for usein a subsequent authentication procedure. Since the Random commandrequires no input parameters, it is therefore simply 1 byte containingthe RND opcode.

[6165] The output of the Random command from a trusted QA Chip can befed straight into the non-trusted chip's Read command as part of theinput parameters. There is no need for the client to store them at all,since they are not required again. However the Test command will onlysucceed if the data passed to the Read command was obtained first fromthe Random command.

[6166] If a caller only calls the Random function multiple times, thesame output will be returned each time.

[6167] R will only advance to the next random number in the sequenceafter a successful call to a function that returns or tests a signature(e.g. Test, see Section 15.1.13 on page 725 for more information).

[6168] The Random command is implemented by the following steps:

[6169] Loop through all of Flash, reading each word (will triggerchecks)

[6170] Output R_(L)

[6171] 15.1.8 Read

[6172] Input: n, t, R_(E)=[1 byte, 1 byte, 20 bytes]

[6173] Output: R_(L), M_(Lt), S_(Kn)[R_(E)|R_(L)|C₁M_(Lt)]=[20 bytes, 64bytes, 20 bytes]

[6174] Changes: R_(L)

[6175] The Read command is used to read the entire state data (M_(t))from an QA ChIP. Only an QA Chip programmed with the correct value ofK_(n) can respond correctly to the Read request. The output bytestreamfrom the Read command can be fed as the input bytestream to the Testcommand on a trusted QA Chip for verification, with M_(t) stored forlater use if Test returns success.

[6176] The input bytestream consists of the RD opcode followed by thekey number to use for the signature, which M to read, and the bytes 0-19of R_(E). 23 bytes are transferred in total. R_(E) is obtained bycalling the trusted QA Chip's Random command. The 20 bytes output by thetrusted chip's Random command can therefore be fed directly into thenon-trusted chip's Read command, with no need for these bits to bestored by System.

[6177] Calls to Read must wait for MinTicksRemaining to reach 0 toensure that a minimum time will elapse between calls to Read.

[6178] The output values are calculated, MinTicksRemaining is updated,and the signature is returned. The contents of M_(Lt) are transferredleast significant byte to most significant byte. The signatureS_(Kn)[R_(E)|R_(L)|C₁|M_(Lt)] must be calculated in constant time.

[6179] The next random number is generated from R using a 160-bitmaximal period LFSR (tap selections on bits 5, 3, 2, and 0). The initial160-bit value for R is set up when the chip is programmed, and can beany random number except 0 (an LFSR filled with 0s will produce anever-ending stream of 0s). R is transformed by XORing bits 0, 2, 3, and5 together, and shifting all 160 bits right 1 bit using the XOR resultas the input bit to b₁₅₉. The process is shown in FIG. 347 below.

[6180] Care should be taken when updating R since it lives in Flash.Program code must assume power could be removed at any time.

[6181] The Read command is implemented with the following steps: Waitfor MinTicksRemaining to become 0 Loop through all of Flash, readingeach word (will trigger checks) Accept n Accept t Restrict n to NRestrict t to T Accept R_(E) Advance R_(L) Output R_(L) Output M_(Lt)Sig

S_(Kn)[R_(E)|R_(L)|C₁|M_(Lt)] # calculation must take constant timeMinTicksRemaining

MinTicks Output Sig Wait for MinTicksRemaining to become 0

[6182] 15.1.9 Set Permissions

[6183] The SetPermissions command is used to securely update thecontents of P_(p) (containing QA Chip permissions). The WriteAuthcommand only attempts to replace P_(p) if the new value is signedcombined with our local R.

[6184] It is only possible to sign messages by knowing K_(n). This canbe achieved by a call to the SignP command (because only a ChipS canknow K_(n)). It means that without a chip that can be used to producethe required signature, a write of any value to P_(p) is not possible.

[6185] The process is very similar to Test, except that if thevalidation succeeds, the P_(E) input parameter is additionally ORed withthe current value for P_(p). Note that this is an OR, and not a replace.Since the SetParms command only sets bits in P_(p), the effect is toallow the permission bits corresponding to M[n] to progress from RW toeither MSR, NMSR, or RO.

[6186] The SetPermissions command is implemented with the followingsteps: Wait for MinTicksRemaining to become 0 Loop through all of Flash,reading each word (will trigger checks) Accept n Restrict n to N Acceptp Restrict p to T+N Accept R_(E) Accept P_(E) SIG_(L)

S_(Kn)[R_(L)|R_(E)|P_(E)|C₂] # calculation must take constant timeAccept SIG_(E) If (SIG_(E) = SIG_(L)) Update R_(L) P_(P)

P_(P)

P_(E) EndIf Output P_(P) # success or failure will be determined byreceiver MinTicksRemaining

MinTicks

[6187] 15.1.10 ReplaceKey

[6188] Input: n, R_(E), V, SIG_(E)=[1 byte, 20 bytes, 20 bytes, 20bytes]

[6189] Output: Boolean (0x76=failure, 0x89=success)

[6190] Changes: K_(n), M_(L), R_(L)

[6191] The ReplaceKey command is used to replace the specified key inthe QA Chip flash memory. However K_(n) can only be replaced if theprevious value is known. A return byte of 0x89 is produced if the keywas successfully updated, while 0x76 is returned for failure.

[6192] A ReplaceKey command consists of the WR_(A) command opcodefollowed by 0x89, 0x76, and then the appropriate parameters. Note thatthe new key is not sent in the clear, it is sent encrypted with thesignature of R_(L), R_(E) and C₃ (signed with the old key). The firsttwo input parameters must be verified by generating a signature usingthe old key.

[6193] The ReplaceKey command is implemented with the following steps:Loop through all of Flash, reading each word (will trigger checks)Accept n Restrict n to N Accept R_(E) # session key from ChipF Accept V# encrypted key SIG_(L)

S_(Kn)[R_(E)|V|C₃] # calculation must take constant time Accept SIG_(E)If (SIG_(L) = SIG_(E2)) # comparison must take constant time SIG_(L)

S_(Kn)[R_(L)|R_(E)|C₃] # calculation must take constant time AdvanceR_(L) K_(E)

SIG_(L) ⊕ V K_(n)

K_(E) # involves storing (K_(E) ⊕ R_(K)) and (

K_(E) ⊕ R_(K)) Output 0x89 # success Else Output 0x76 # failure EndIf

[6194] 15.1.11 SignM

[6195] Input: n, R_(X), R_(E), M_(E), SIG_(E), M_(desired)=[1 byte, 20bytes, 20 bytes, 64 bytes,32 bytes]

[6196] Output: R_(L), M_(new), S_(Kn)[R_(E)|R_(l)|C₁|M_(new)]=[20 bytes,64 bytes, 20 bytes]

[6197] Changes:R_(L)

[6198] Note: The SignM command is only implemented in ChipS, and not inall QA Chips. The SignM command is used to produce a valid signed M foruse in an authenticated write transaction. Only an QA Chip programmedwith correct value of K_(n) can respond correctly to the SignM request.The output bytestream from the SignM command can be fed as the inputbytestream to the WriteA command on a different QA ChIP.

[6199] The input bytestream consists of the SMR opcode followed by 1byte containing the key number to use for generating the signature, 20bytes of Rx (representing the number passed in as R to ChipU's READcommand, i.e. typically 0), the output from the READ command (namelyR_(E), M_(E), and SIG_(E)), and finally the desired M to write to ChipU.

[6200] The SignM command only succeeds whenSIG_(E)=S_(K)[R_(X)|R_(E)|C₁|M_(E)], indicating that the request wasgenerated from a chip that knows K. This generation and comparison musttake the same amount of time regardless of whether the input parametersare correct or not. If the times are not the same, an attacker can gaininformation about which bits of the supplied signature are incorrect. Ifthe signatures match, then R_(L) is updated to be the next random numberin the sequence.

[6201] Since the SignM function generates signatures, the function mustwait for the MinTicksRemaining register to reach 0 before processingtakes place.

[6202] Once all the inputs have been verified, a new memory vector isproduced by applying a specially stored P value (eg word 1 of M₀) andM_(desired) against M_(E). Effectively, it is performing a regularWrite, but with separate P against someone else's M. The M_(new) issigned with an updated R_(L) (and the passed in R_(E)), and all threevalues are output (the random number R_(L), M_(new), and the signature).The time taken to generate this signature must be the same regardless ofthe inputs.

[6203] Typically, the SignM command will be acting as a form ofconsumable command, so that a given ChipS can only generate a givennumber of signatures. The actual number can conveniently be stored in M₀(eg word 0 of M₀) with ReadOnly permissions. Of course another chipcould perform an Authorised write to update the number (using anotherChipS) should it be desired.

[6204] The SignM command is implemented with the following steps: Waitfor MinTicksRemaining to become 0 Loop through all of Flash, readingeach word (will trigger checks) Accept n Restrict n to N Accept R_(X) #don't care what this number is Accept R_(E) Accept M_(E) SIG_(L)

S_(Kn)[R_(X)|R_(E)|C₁|M_(E)] # calculation must take constant timeAccept SIG_(E) Accept M_(desired) If ((SIG_(E) ≠ SIG_(L)) OR (M_(L)[0] =0)) # fail if bad signature or if allowed sigs = 0 Output appropriatenumber of 0 # report failure Done EndIf Update R_(L) # Create the newversion of M in ram from W and Permissions # This is the same as thecore process of Write function # except that we don't write the resultsback to M DecEncountered

0 EqEncountered

0 Permissions = M_(L)[1] # assuming M₀ contains appropriate permissionsFor n

msw to lsw #(word 15 to 0) AM

Permissions[n] LT

(M_(desired)[n] < M_(E)[n]) # comparison is unsigned EQ

(M_(desired)[n] = M_(E)[n])

(DecEncountered

LT)) DecEncountered

((AM = MSR)

LT)

((AM = NMSR)

DecEncountered)

((AM = NMSR)

EqEncountered

LT) EqEncountered 

 ((AM = MSR)  

 EQ) 

 ((AM = NMSR) 

EqEncountered

EQ) If (

WE)

(M_(E)[n] ≠ M_(desired)[n]) Output appropriate number of 0 # reportfailure EndIf EndFor # At this point, M_(desired) is correct OutputR_(L) Output M_(desired) # M_(desired) is now effectively M_(new) Sig

S_(Kn)[R_(E)|R_(L)|C₁|M_(desired)] # calculation must take constant timeMinTicksRemaining

MinTicks Decrement M_(L)[0] # reduce the number of allowable signaturesby 1 Output Sig

[6205] 15.1.12 SignP

[6206] Input: n,R_(E),P_(desired)=[1 byte, 20 bytes, 4 bytes]

[6207] Output: R_(L), S_(Kn)[R_(E)|R_(L)|P_(desired)|C₂]=[20 bytes, 20bytes]

[6208] Changes: R_(L)

[6209] Note: The SignP command is only implemented in ChipS, and not inall QA Chips.

[6210] The SignP command is used to produce a valid signed P for use ina SetPermissions transaction.

[6211] Only an QA Chip programmed with correct value of K_(n) canrespond correctly to the SignP request. The output bytestream from theSignP command can be fed as the input bytestream to the SetPermissionscommand on a different QA ChIP.

[6212] The input bytestream consists of the SMP opcode followed by 1byte containing the key number to use for generating the signature, 20bytes of R_(E) (representing the number obtained from ChipU's RNDcommand, and finally the desired P to write to ChipU.

[6213] Since the SignP function generates signatures, the function mustwait for the MinTicksRemaining register to reach 0 before processingtakes place.

[6214] Once all the inputs have been verified, the P_(desired) is signedwith an updated R_(L) (and the passed in R_(E)), and both values areoutput (the random number R_(L) and the signature). The time taken togenerate this signature must be the same regardless of the inputs.

[6215] Typically, the SignP command will be acting as a form ofconsumable command, so that a given ChipS can only generate a givennumber of signatures. The actual number can conveniently be stored in M₀(eg word 0 of M₀) with ReadOnly permissions. Of course another chipcould perform an Authorised write to update the number (using anotherChipS) should it be desired.

[6216] The SignM command is implemented with the following steps: Waitfor MinTicksRemaining to become 0 Loop through all of Flash, readingeach word (will trigger checks) Accept n Restrict n to N Accept R_(E)Accept P_(desired) If (M_(L)[0] = 0) # fail if allowed sigs = 0 Outputappropriate number of 0 # report failure Done EndIf Update R_(L) OutputR_(L) Sig

S_(Kn)[R_(E)|R_(L)|P_(desired)|C₂] # calculation must take constant timeMinTicksRemaining

MinTicks Decrement M_(L)[0] # reduce the number of allowable signaturesby 1 Output Sig

[6217] 15.1.13 Test

[6218] Input: n, R_(E), M_(E), SIG_(E)=[1 byte, 20 bytes, 64 bytes, 20bytes]

[6219] Output: Boolean (0x76=failure, 0x89=success)

[6220] Changes: R_(L)

[6221] The Test command is used to authenticate a read of an M from anon-trusted QA ChIP.

[6222] The Test command consists of the TST command opcode followed byinput parameters: n, R_(E), M_(E), and SIG_(E). The byte order is leastsignificant byte to most significant byte for each command component.All but the first input parameter bytes are obtained as the output bytesfrom a Read command to a non-trusted QA ChIP. The entire data does nothave to be stored by the client. Instead, the bytes can be passeddirectly to the trusted QA Chip's Test command, and only M should bekept from the Read.

[6223] Calls to Test must wait for the MinTicksRemaining register toreach 0.

[6224] S_(Kn)[R_(L)|R_(E)|C₁|M_(E)] is then calculated, and comparedagainst the input signature SIG_(E). If they are different, R_(L) is notchanged, and 0x76 is returned to indicate failure. If they are the same,then R_(L) is updated to be the next random number in the sequence and0x89 is returned to indicate success. Updating R_(L) only after successforces the caller to use a new random number (via the Random command)each time a successful authentication is performed.

[6225] The calculation of S_(Kn)[R_(L)|R_(E)|C₁|M_(E)] and thecomparison against SIG_(E) must take identical time so that the time toevaluate the comparison in the TST function is always the same. Thus noattacker can compare execution times or number of bits processed beforean output is given.

[6226] The Test command is implemented with the following steps: Waitfor MinTicksRemaining to become 0 Loop through all of Flash, readingeach word (will trigger checks Accept n Restrict n to N Accept R_(E)Accept M_(E) SIG_(L)

S_(Kn)[R_(L)|R_(E)|C₁|M_(E)] # calculation must take constant timeAccept SIG_(E) If (SIG_(E) = SIG_(L)) Update R_(L) Output 0x89 # successElse Output 0x76 # report failure EndIf MinTicksRemaining

MinTicks

[6227] 15.1.14 Write

[6228] Input: t, M_(new), SIG_(E)=[1 byte, 64 bytes, 20 bytes]

[6229] Output: Boolean (0x76=failure, 0x89=success)

[6230] Changes: M_(t)

[6231] The Write command is used to update M_(t) according to thepermissions in P_(t). The WR command by itself is not secure, since aclone QA Chip may simply return success every time. Therefore a Writecommand should be followed by an authenticated read of M_(t) (e.g. via aRead command) to ensure that the change was actually made.

[6232] The Write command is called by passing the WR command opcodefollowed by which M to be updated, the new data to be written to M, anda digital signature of M. The data is sent least significant byte tomost significant byte.

[6233] The ability to write to a specific 32-bit word within M_(t) isgoverned by the corresponding Permissions bits as stored in P_(t). P_(t)can be set using the SetPermissions command.

[6234] The fact that M_(t) is Flash memory must be taken into accountwhen writing the new value to M. It is possible for an attacker toremove power at any time. In addition, only the changes to M should bestored for maximum utilization. In addition, the longevity of M willneed to be taken into account. This may result in the location of Mbeing updated.

[6235] The signature is not keyed, since it must be generated by theconsumable user.

[6236] The Write command is implemented with the following steps: Loopthrough all of Flash, reading each word (will trigger checks) Accept tRestrict t to T Accept M_(E) # new M Accept SIG_(E) SIG_(L) = GenerateSHA1 [M_(E)] If (SIG_(L) = SIG_(E)) output 0x76 # failure due to invalidsignature exit EndIf DecEncountered

0 EqEncountered

0 For i

msw to lsw #(word 15 to 0) P

P_(t)[i] LT

(M_(E)[i] < M_(t)[i]) # comparison is unsigned EQ

(M_(E)[i] = M_(t)[i]) WE

(P = RW)

((P = MSR)

LT)

((P = NMSR)

(DecEncountered

LT)) DecEncountered

((P = MSR)

LT)

((P = NMSR)

DecEncountered)

((P = NMSR)

EqEncountered

LT) EqEncountered

((P = MSR)

EQ)

((P = NMSR)

EqEncountered

EQ) If (

WE)

(M_(E)[i] ≠ M_(t)[i]) output 0x76 # failure due to wanting a change butnot allowed it EndIf EndFor # At this point, M_(E) (desired) is correctto be written to the flash M_(t)

M_(E) # update flash output 0x89 # success

[6237] 15.1.15 WriteAuth

[6238] Input: n, R_(E), M_(E), SIG_(E)=[1 byte, 20 bytes, 64 bytes, 20bytes]

[6239] Output: Boolean (0x76=failure, 0x89=success)

[6240] Changes: M₀, R_(L)

[6241] The WriteAuth command is used to securely replace the entirecontents of M₀ (containing QA Chip application specific data) accordingto the P_(T+n). The WriteAuth command only attempts to replace M₀ if thenew value is signed combined with our local R.

[6242] It is only possible to sign messages by knowing K_(n). This canbe achieved by a call to the SignM command (because only a ChipS canknow K_(n)). It means that without a chip that can be used to producethe required signature, a write of any value to M₀ is not possible.

[6243] The process is very similar to Write, except that if thevalidation succeeds, the M_(E) input parameter is processed against M₀using permissions P_(T+n).

[6244] The WriteAuth command is implemented with the following steps:Wait for MinTicksRemaining to become 0 Loop through all of Flash,reading each word (will trigger checks) Accept n Restrict n to N AcceptR_(E) Accept M_(E) SIG_(L)

S_(Kn)[R_(L)|R_(E)|C₁|M_(E)] # calculation must take constant timeAccept SIG_(E) If (SIG_(E) = SIG_(L)) Update R_(L) DecEncountered

0 EqEncountered

0 For i

msw to lsw #(word 15 to 0) P

P_(T+n)[i] LT

(M_(E)[i] < M₀[i]) # comparison is unsigned EQ

(M_(E)[i] = M₀[i]) WE

(P = RW)

((P = MSR)

LT)

((P = NMSR)

(DecEncountered

LT)) DecEncountered

((P = MSR)

LT)

((P = NMSR)

DecEncountered)

((P = NMSR)

EqEncountered

LT) EqEncountered

((P = MSR)

EQ)

((P = NMSR)

EqEncountered

EQ) If ((

WE)

(M_(E)[i] ≠ M₀[i])) output 0x76 # failure due to wanting a change butnot allowed it EndIf EndFor # At this point, M_(E)(desired) is correctto be written to the flash M₀

M_(E) # update flash output 0x89 # success EndIf MinTicksRemaining

MinTicks

[6245] 16 Manufacture

[6246] This chapter makes some general comments about the manufactureand implementation of authentication chips. While the comments presentedhere are general, see [84] for a detailed description of animplementation of an authentication chIP.

[6247] The authentication chip algorithms do not constitute a strongencryption device. The net effect is that they can be safelymanufactured in any country (including the USA) and exported to anywherein the world.

[6248] The circuitry of the authentication chip must be resistant tophysical attack. A summary of manufacturing implementation guidelines ispresented, followed by specification of the chip's physical defenses(ordered by attack).

[6249] Note that manufacturing comments are in addition to any legalprotection undertaken, such as patents, copyright, and licenseagreements (for example, penalties if caught reverse engineering theauthentication chIP).

[6250] 16.1 Guidelines for Manufacturing

[6251] The following are general guidelines for implementation of anauthentication chip in terms of manufacture (see [84] for a detaileddescription of an authentication chIP). No special security is requiredduring the manufacturing process.

[6252] Standard process

[6253] Minimum size (if possible)

[6254] Clock Filter

[6255] Noise Generator

[6256] Tamper Prevention and Detection circuitry

[6257] Protected memory with tamper detection

[6258] Boot circuitry for loading program code

[6259] Special implementation of FETs for key data paths

[6260] Data connections in polysilicon layers where possible

[6261] OverUnderPower Detection Unit

[6262] No test circuitry

[6263] Transparent epoxy packaging

[6264] Finally, as a general note to manufacturers of Systems, the dataline to the System authentication chip and the data line to theConsumable authentication chip must not be the same line. See Section16.2.3 on page 736.

[6265] 16.1.1 Standard Process

[6266] The authentication chip should be implemented with a standardmanufacturing process (such as Flash). This is necessary to:

[6267] allow a great range of manufacturing location options

[6268] take advantage of well-defined and well-behaved technology

[6269] reduce cost

[6270] Note that the standard process still allows physical protectionmechanisms.

[6271] 16.1.2 Minimum Size

[6272] The authentication chip must have a low manufacturing cost inorder to be included as the authentication mechanism for low costconsumables. It is therefore desirable to keep the chip size as low asreasonably possible.

[6273] Each authentication chip requires 962 bits of non-volatilememory. In addition, the storage required for optimized HMAC-SHA1 is1024 bits. The remainder of the chip (state machine, processor, CPU orwhatever is chosen to implement Protocol C1) must be kept to a minimumin order that the number of transistors is minimized and thus the costper chip is minimized. The circuit areas that process the secret keyinformation or could reveal information about the key should also beminimized (see Section 16.1.8 on page 734 for special data paths).

[6274] 16.1.3 Clock Filter

[6275] The authentication chip circuitry is designed to operate within aspecific clock speed range. Since the user directly supplies the clocksignal, it is possible for an attacker to attempt to introducerace-conditions in the circuitry at specific times during processing. Anexample of this is where a high clock speed (higher than the circuitryis designed for) may prevent an XOR from working properly, and of thetwo inputs, the first may always be returned. These styles of transientfault attacks can be very efficient at recovering secret keyinformation, and have been documented in [5] and [1]. The lesson to belearned from this is that the input clock signal cannot be trusted.

[6276] Since the input clock signal cannot be trusted, it must belimited to operate up to a maximum frequency. This can be achieved anumber of ways.

[6277] One way to filter the clock signal is to use an edge detect unitpassing the edge on to a delay, which in turn enables the input clocksignal to pass through.

[6278]FIG. 348 shows clock signal flow within the Clock Filter.

[6279] The delay should be set so that the maximum clock speed is aparticular frequency (e.g. about 4 MHz). Note that this delay is notprogrammable—it is fixed.

[6280] The filtered clock signal would be further divided internally asrequired.

[6281] 16.1.4 Noise Generator

[6282] Each authentication chip should contain a noise generator thatgenerates continuous circuit noise. The noise will interfere with otherelectromagnetic emissions from the chip's regular activities and addnoise to the I_(dd) signal. Placement of the noise generator is not anissue on an authentication chip due to the length of the emissionwavelengths.

[6283] The noise generator is used to generate electronic noise,multiple state changes each clock cycle, and as a source ofpseudo-random bits for the Tamper Prevention and Detection circuitry(see Section 16.1.5 on page 731).

[6284] A simple implementation of a noise generator is a 64-bit maximalperiod LFSR seeded with a non-zero number. The clock used for the noisegenerator should be running at the maximum clock rate for the chip inorder to generate as much noise as possible.

[6285] 16.1.5 Tamper Prevention and Detection Circuitry

[6286] A set of circuits is required to test for and prevent physicalattacks on the authentication chIP. However what is actually detected asan attack may not be an intentional physical attack. It is thereforeimportant to distinguish between these two types of attacks in anauthentication chip:

[6287] where you can be certain that a physical attack has occurred.

[6288] where you cannot be certain that a physical attack has occurred.

[6289] The two types of detection differ in what is performed as aresult of the detection. In the first case, where the circuitry can becertain that a true physical attack has occurred, erasure of Flashmemory key information is a sensible action. In the second case, wherethe circuitry cannot be sure if an attack has occurred, there is stillcertainly something wrong. Action must be taken, but the action shouldnot be the erasure of secret key information. A suitable action to takein the second case is a chip RESET. If what was detected was an attackthat has permanently damaged the chip, the same conditions will occurnext time and the chip will RESET again. If, on the other hand, what wasdetected was part of the normal operating environment of the chip, aRESET will not harm the key.

[6290] A good example of an event that circuitry cannot have knowledgeabout, is a power glitch. The glitch may be an intentional attack,attempting to reveal information about the key. It may, however, be theresult of a faulty connection, or simply the start of a power-downsequence. It is therefore best to only RESET the chip, and not erase thekey. If the chip was powering down, nothing is lost. If the System isfaulty, repeated RESETs will cause the consumer to get the Systemrepaired. In both cases the consumable is still intact.

[6291] A good example of an event that circuitry can have knowledgeabout, is the cutting of a data line within the chIP. If this attack issomehow detected, it could only be a result of a faulty chip(manufacturing defect) or an attack. In either case, the erasure of thesecret information is a sensible step to take.

[6292] Consequently each authentication chip should have 2 TamperDetection Lines—one for definite attacks, and one for possible attacks.Connected to these Tamper Detection Lines would be a number of TamperDetection test units, each testing for different forms of tampering. Inaddition, we want to ensure that the Tamper Detection Lines and Circuitsthemselves cannot also be tampered with.

[6293] At one end of the Tamper Detection Line is a source ofpseudo-random bits (clocking at high speed compared to the generaloperating circuitry). The Noise Generator circuit described above is anadequate source. The generated bits pass through two different paths—onecarries the original data, and the other carries the inverse of thedata. The wires carrying these bits are in the layer above the generalchip circuitry (for example, the memory, the key manipulation circuitryetc.). The wires must also cover the random bit generator. The bits arerecombined at a number of places via an XOR gate. If the bits aredifferent (they should be), a 1 is output, and used by the particularunit (for example, each output bit from a memory read should be ANDedwith this bit value). The lines finally come together at the Flashmemory Erase circuit, where a complete erasure is triggered by a 0 fromthe XOR. Attached to the line is a number of triggers, each detecting aphysical attack on the chIP. Each trigger has an oversize nMOStransistor attached to GND. The Tamper Detection Line physically goesthrough this nMOS transistor. If the test fails, the trigger causes theTamper Detect Line to become 0. The XOR test will therefore fail oneither this clock cycle or the next one (on average), thus RESETing orerasing the chIP.

[6294]FIG. 349 illustrates the basic principle of a Tamper DetectionLine in terms of tests and the XOR connected to either the Erase orRESET circuitry.

[6295] The Tamper Detection Line must go through the drain of an outputtransistor for each test, as illustrated by FIG. 350:

[6296] It is not possible to break the Tamper Detect Line since thiswould stop the flow of 1 s and 0s from the random source. The XOR testswould therefore fail. As the Tamper Detect Line physically passesthrough each test, it is not possible to eliminate any particular testwithout breaking the Tamper Detect Line.

[6297] It is important that the XORs take values from a variety ofplaces along the Tamper Detect Lines in order to reduce the chances ofan attack. FIG. 351 illustrates the taking of multiple XORs from theTamper Detect Line to be used in the different parts of the chIP. Eachof these XORs can be considered to be generating a ChipOK bit that canbe used within each unit or sub-unit.

[6298] A sample usage would be to have an OK bit in each unit that isANDed with a given ChipOK bit each cycle. The OK bit is loaded with 1 ona RESET. If OK is 0, that unit will fail until the next RESET. If theTamper Detect Line is functioning correctly, the chip will either RESETor erase all key information. If the RESET or erase circuitry has beendestroyed, then this unit will not function, thus thwarting an attacker.

[6299] The destination of the RESET and Erase line and associatedcircuitry is very context sensitive. It needs to be protected in muchthe same way as the individual tamper tests. There is no pointgenerating a RESET pulse if the attacker can simply cut the wire leadingto the RESET circuitry. The actual implementation will depend very muchon what is to be cleared at RESET, and how those items are cleared.

[6300] Finally, FIG. 352 shows how the Tamper Lines cover the noisegenerator circuitry of the chIP. The generator and NOT gate are on onelevel, while the Tamper Detect Lines run on a level above the generator.

[6301] 16.1.6 Protected Memory with Tamper Detection

[6302] It is not enough to simply store secret information or programcode in Flash memory. The Flash memory and RAM must be protected from anattacker who would attempt to modify (or set) a particular bit ofprogram code or key information. The mechanism used must conform tobeing used in the Tamper Detection Circuitry (described above).

[6303] The first part of the solution is to ensure that the TamperDetection Line passes directly above each Flash or RAM bit. This ensuresthat an attacker cannot probe the contents of Flash or RAM. A breach ofthe covering wire is a break in the Tamper Detection Line. The breachcauses the Erase signal to be set, thus deleting any contents of thememory. The high frequency noise on the Tamper Detection Line alsoobscures passive observation.

[6304] The second part of the solution for Flash is to use multi-leveldata storage, but only to use a subset of those multiple levels forvalid bit representations. Normally, when multi-level Flash storage isused, a single floating gate holds more than one bit. For example, a4-voltage-state transistor can represent two bits. Assuming a minimumand maximum voltage representing 00 and 11 respectively, the two middlevoltages represent 01 and 10. In the authentication chip, we can use thetwo middle voltages to represent a single bit, and consider the twoextremes to be invalid states. If an attacker attempts to force thestate of a bit one way or the other by closing or cutting the gate'scircuit, an invalid voltage (and hence invalid state) results.

[6305] The second part of the solution for RAM is to use a parity bit.The data part of the register can be checked against the parity bit(which will not match after an attack).

[6306] The bits coming from Flash and RAM can therefore be validated bya number of test units (one per bit) connected to the common TamperDetection Line. The Tamper Detection circuitry would be the firstcircuitry the data passes through (thus stopping an attacker fromcutting the data lines).

[6307] While the multi-level Flash protection is enough for non-secretinformation, such as program code, R, and MinTicks, it is not sufficientfor protecting K₁ and K₂. If an attacker adds electrons to a gate (seeSection 5.7.2.15 on page 656) representing a single bit of K₁, and thechip boots up yet doesn't activate the Tamper Detection Line, the keybit must have been a 0. If it does activate the Tamper Detection Line,it must have been a 1. For this reason, all other non-volatile memorycan activate the Tamper Detection Line, but K₁ and K₂ must not.Consequently Checksum is used to check for tampering of K₁ and K₂. Asignature of the expanded form of K₁ and K₂ (i.e. 320 bits instead of160 bits for each of K₁ and K₂) is produced, and the result comparedagainst the Checksum. Any non-match causes a clear of all keyinformation.

[6308] 16.1.7 Boot Circuitry for Loading Program Code

[6309] Program code should be kept in multi-level Flash instead of ROM,since ROM is subject to being altered in a non-testable way. A bootmechanism is therefore required to load the program code into Flashmemory (Flash memory is in an indeterminate state after manufacture).

[6310] The boot circuitry must not be in ROM—a small state-machine wouldsuffice. Otherwise the boot code could be modified in an undetectableway.

[6311] The boot circuitry must erase all Flash memory, check to ensurethe erasure worked, and then load the program code. Flash memory must beerased before loading the program code. Otherwise an attacker could putthe chip into the boot state, and then load program code that simplyextracted the existing keys. The state machine must also check to ensurethat all Flash memory has been cleared (to ensure that an attacker hasnot cut the Erase line) before loading the new program code.

[6312] The loading of program code must be undertaken by the secureProgramming Station before secret information (such as keys) can beloaded. This step must be undertaken as the first part of theprogramming process.

[6313] 16.1.8 Special Implementation of FETs for Key Data Paths

[6314] The normal situation for FET implementation for the case of aCMOS Inverter (which involves a pMOS transistor combined with an nMOStransistor) as shown in FIG. 353:

[6315] During the transition, there is a small period of time where boththe nMOS transistor and the pMOS transistor have an intermediateresistance. The resultant power-ground short circuit causes a temporaryincrease in the current, and in fact accounts for the majority ofcurrent consumed by a CMOS device. A small amount of infrared light isemitted during the short circuit, and can be viewed through the siliconsubstrate (silicon is transparent to infrared light). A small amount oflight is also emitted during the charging and discharging of thetransistor gate capacitance and transmission line capacitance.

[6316] For circuitry that manipulates secret key information, suchinformation must be kept hidden. An alternative non-flashing CMOSimplementation should therefore be used for all data paths thatmanipulate the key or a partially calculated value that is based on thekey.

[6317] The use of two non-overlapping clocks φ1 and φ2 can provide anon-flashing mechanism. φ1 is connected to a second gate of all nMOStransistors, and φ2 is connected to a second gate of all pMOStransistors. The transition can only take place in combination with theclock. Since φ1 and φ2 are non-overlapping, the pMOS and nMOStransistors will not have a simultaneous intermediate resistance. Thesetup is shown in FIG. 354:

[6318] Finally, regular CMOS inverters can be positioned near criticalnon-Flashing CMOS components. These inverters should take their inputsignal from the Tamper Detection Line above. Since the Tamper DetectionLine operates multiple times faster than the regular operatingcircuitry, the net effect will be a high rate of light-bursts next toeach non-Flashing CMOS component. Since a bright light overwhelmsobservation of a nearby faint light, an observer will not be able todetect what switching operations are occurring in the chip proper. Theseregular CMOS inverters will also effectively increase the amount ofcircuit noise, reducing the SNR and obscuring useful EMI.

[6319] There are a number of side effects due to the use of non-FlashingCMOS:

[6320] The effective speed of the chip is reduced by twice the rise timeof the clock per clock cycle. This is not a problem for anauthentication chIP.

[6321] The amount of current drawn by the non-Flashing CMOS is reduced(since the short circuits do not occur). However, this is offset by theuse of regular CMOS inverters.

[6322] Routing of the clocks increases chip area, especially sincemultiple versions of φ1 and φ2 are required to cater for differentlevels of propagation. The estimation of chip area is double that of aregular implementation.

[6323] Design of the non-Flashing areas of the authentication chip areslightly more complex than to do the same with a with a regular CMOSdesign. In particular, standard cell components cannot be used, makingthese areas full custom. This is not a problem for something as small asan authentication chip, particularly when the entire chip does not haveto be protected in this manner.

[6324] 16.1.9 Connections in Polysilicon Layers Where Possible

[6325] Wherever possible, the connections along which the key or secretdata flows, should be made in the polysilicon layers. Where necessary,they can be in metal 1, but must never be in the top metal layer(containing the Tamper Detection Lines).

[6326] 16.1.10 OverUnderPower Detection Unit

[6327] Each authentication chip requires an OverUnderPower DetectionUnit to prevent Power Supply Attacks. An OverUnderPower Detection Unitdetects power glitches and tests the power level against a VoltageReference to ensure it is within a certain tolerance. The Unit containsa single Voltage Reference and two comparators. The OverUnderPowerDetection Unit would be connected into the RESET Tamper Detection Line,thus causing a RESET when triggered.

[6328] A side effect of the OverUnderPower Detection Unit is that as thevoltage drops during a power-down, a RESET is triggered, thus erasingany work registers.

[6329] 16.1.11 No Test Circuitry

[6330] Test hardware on an authentication chip could very easilyintroduce vulnerabilities. As a result, the authentication chip shouldnot contain any BIST or scan paths.

[6331] The authentication chip must therefore be testable with externaltest vectors. This should be possible since the authentication chip isnot complex.

[6332] 16.1.12 Transparent Epoxy Packaging

[6333] The authentication chip needs to be packaged in transparent epoxyso it can be photo-imaged by the programming station to prevent Trojanhorse attacks. The transparent packaging does not compromise thesecurity of the authentication chip since an attacker can fairly easilyremove a chip from its packaging. For more information see Section16.2.20 on page 743 and [86].

[6334] 16.2 Resistance to Physical Attacks

[6335] While this chapter only describes manufacture in general terms(since this document does not cover a specific implementation of aProtocol C1 authentication chIP), we can still make some observationsabout such a chip's resistance to physical attack. A description of thegeneral form of each physical attack can be found in Section 5.7.2 onpage 652.

[6336] 16.2.1 Reading ROM

[6337] This attack depends on the key being stored in an addressableROM. Since each authentication chip stores its authentication keys ininternal Flash memory and not in an addressable ROM, this attack isirrelevant.

[6338] 16.2.2 Reverse Engineering the Chip

[6339] Reverse engineering a chip is only useful when the security ofauthentication lies in the algorithm alone. However our authenticationchips rely on a secret key, and not in the secrecy of the algorithm. Ourauthentication algorithm is, by contrast, public, and in any case, anattacker of a high volume consumable is assumed to have been able toobtain detailed plans of the internals of the chIP.

[6340] In light of these factors, reverse engineering the chip itself,as opposed to the stored data, poses no threat.

[6341] 16.2.3 Usurping the Authentication Process

[6342] There are several forms this attack can take, each with varyingdegrees of success. In all cases, it is assumed that a clonemanufacturer will have access to both the System and the consumabledesigns.

[6343] An attacker may attempt to build a chip that tricks the Systeminto returning a valid code instead of generating an authenticationcode. This attack is not possible for two reasons. The first reason isthat System authentication chips and Consumable authentication chips,although physically identical, are programmed differently. Inparticular, the RD opcode and the RND opcode are the same, as are the WRand TST opcodes. A System authentication Chip cannot perform a RDcommand since every call is interpreted as a call to RND instead. Thesecond reason this attack would fail is that separate serial data linesare provided from the System to the System and Consumable authenticationchips. Consequently neither chip can see what is being transmitted to orreceived from the other.

[6344] If the attacker builds a clone chip that ignores WR commands(which decrement the consumable remaining), Protocol C1 ensures that thesubsequent RD will detect that the WR did not occur. The System willtherefore not go ahead with the use of the consumable, thus thwartingthe attacker. The same is true if an attacker simulates loss of contactbefore authentication—since the authentication does not take place, theuse of the consumable doesn't occur.

[6345] An attacker is therefore limited to modifying each System inorder for clone consumables to be accepted (see Section 16.2.4 on page737 for details of resistance this attack).

[6346] 16.2.4 Modification of System

[6347] The simplest method of modification is to replace the System'sauthentication chip with one that simply reports success for each callto TST. This can be thwarted by System calling TST several times foreach authentication, with the first few times providing false values,and expecting a fail from TST. The final call to TST would be expectedto succeed. The number of false calls to TST could be determined by somepart of the returned result from RD or from the system clock.Unfortunately an attacker could simply rewire System so that the newSystem clone authentication chip can monitor the returned result fromthe consumable chip or clock. The clone System authentication chip wouldonly return success when that monitored value is presented to its TSTfunction. Clone consumables could then return any value as the hashresult for RD, as the clone System chip would declare that value valid.There is therefore no point for the System to call the Systemauthentication chip multiple times, since a rewiring attack will onlywork for the System that has been rewired, and not for all Systems.

[6348] A similar form of attack on a System is a replacement of theSystem ROM. The ROM program code can be altered so that theAuthentication never occurs. There is nothing that can be done aboutthis, since the System remains in the hands of a consumer. Of coursethis would void any warranty, but the consumer may consider thealteration worthwhile if the clone consumable were extremely cheap andmore readily available than the original item.

[6349] The System/consumable manufacturer must therefore determine howlikely an attack of this nature is. Such a study must include given thepricing structure of Systems and Consumables, frequency of Systemservice, advantage to the consumer of having a physical modificationperformed, and where consumers would go to get the modificationperformed.

[6350] The likelihood of physical alteration increases with theperceived artificiality of the consumable marketing scheme. It is onething for a consumable to be protected against clone manufacturers. Itis quite another for a consumable's market to be protected by a form ofexclusive licensing arrangement that creates what is viewed by consumersas artificial markets. In the former case, owners are not so likely togo to the trouble of modifying their system to allow a clonemanufacturer's goods. In the latter case, consumers are far more likelyto modify their System. A case in point is DVD. Each DVD is marked witha region code, and will only play in a DVD player from that region. Thusa DVD from the USA will not play in an Australian player, and a DVD fromJapan, Europe or Australia will not play in a USA DVD player. Given thatcertain DVD titles are not available in all regions, or because ofquality differences, pricing differences or timing of releases, manyconsumers have had their DVD players modified to accept DVDs from anyregion. The modification is usually simple (it often involves solderinga single wire), voids the owner's warranty, and often costs the ownersome money. But the interesting thing to note is that the change is notmade so the consumer can use clone consumables—the consumer will stillonly buy real consumables, but from different regions. The modificationis performed to remove what is viewed as an artificial barrier, placedon the consumer by the movie companies. In the same way, aSystem/Consumable scheme that is viewed as unfair will result in peoplemaking modifications to their Systems.

[6351] The limit case of modifying a system is for a clone manufacturerto provide a completely clone System which takes clone consumables. Thismay be simple competition or violation of patents. Either way, it isbeyond the scope of the authentication chip and depends on thetechnology or service being cloned.

[6352] 16.2.5 Direct Viewing of Chip Operation by Conventional Probing

[6353] In order to view the chip operation, the chip must be operating.However, the Tamper Prevention and Detection circuitry covers thosesections of the chip that process or hold the key. It is not possible toview those sections through the Tamper Prevention lines.

[6354] An attacker cannot simply slice the chip past the TamperPrevention layer, for this will break the Tamper Detection Lines andcause an erasure of all keys at power-up. Simply destroying the erasurecircuitry is not sufficient, since the multiple ChipOK bits (now all 0)feeding into multiple units within the authentication chip will causethe chip's regular operating circuitry to stop functioning.

[6355] To set up the chip for an attack, then, requires the attacker todelete the Tamper Detection lines, stop the Erasure of Flash memory, andsomehow rewire the components that relied on the ChipOK lines. Even ifall this could be done, the act of slicing the chip to this level willmost likely destroy the charge patterns in the non-volatile memory thatholds the keys, making the process fruitless.

[6356] 16.2.6 Direct Viewing of the Non-Volatile Memory

[6357] If the authentication chip were sliced so that the floating gatesof the Flash memory were exposed, without discharging them, then thekeys could probably be viewed directly using an STM or SKM. However,slicing the chip to this level without discharging the gates is probablyimpossible. Using wet etching, plasma etching, ion milling, or chemicalmechanical polishing will almost certainly discharge the small chargespresent on the floating gates. This is true of regular Flash memory, buteven more so of multi-level Flash memory.

[6358] 16.2.7 Viewing the Light Bursts Caused by State Changes

[6359] All sections of circuitry that manipulate secret key informationare implemented in the non-Flashing CMOS described above. This preventsthe emission of the majority of light bursts. Regular CMOS invertersplaced in close proximity to the non-Flashing CMOS will hide any faintemissions caused by capacitor charge and discharge. The inverters areconnected to the Tamper Detection circuitry, so they change state manytimes (at the high clock rate) for each non-Flashing CMOS state change.

[6360] 16.2.8 Viewing the Keys Using an SEPM

[6361] An SEPM attack can be simply thwarted by adding a metal layer tocover the circuitry. However an attacker could etch a hole in the layer,so this is not an appropriate defense.

[6362] The Tamper Detection circuitry described above will shield thesignal as well as cause circuit noise. The noise will actually be agreater signal than the one that the attacker is looking for. If theattacker attempts to etch a hole in the noise circuitry covering theprotected areas, the chip will not function, and the SEPM will not beable to read any data.

[6363] An SEPM attack is therefore fruitless.

[6364] 16.2.9 Monitoring EMI

[6365] The Noise Generator described above will cause circuit noise. Thenoise will interfere with other electromagnetic emissions from thechip's regular activities and thus obscure any meaningful reading ofinternal data transfers.

[6366] 16.2.10 Viewing I_(dd) Fluctuations

[6367] The solution against this kind of attack is to decrease the SNRin the I_(dd) signal. This is accomplished by increasing the amount ofcircuit noise and decreasing the amount of signal.

[6368] The Noise Generator circuit (which also acts as a defense againstEMI attacks) will also cause enough state changes each cycle to obscureany meaningful information in the I_(dd) signal.

[6369] In addition, the special Non-Flashing CMOS implementation of thekey-carrying data paths of the chip prevents current from flowing whenstate changes occur. This has the benefit of reducing the amount ofsignal.

[6370] 16.2.11 Differential Fault Analysis

[6371] Differential fault bit errors are introduced in a non-targetedfashion by ionization, microwave radiation, and environmental stress.The most likely effect of an attack of this nature is a change in Flashmemory (causing an invalid state) or RAM (bad parity). Invalid statesand bad parity are detected by the Tamper Detection Circuitry, and causean erasure of the key.

[6372] Since the Tamper Detection Lines cover the key manipulationcircuitry, any error introduced in the key manipulation circuitry willbe mirrored by an error in a Tamper Detection Line. If the TamperDetection Line is affected, the chip will either continually RESET orsimply erase the key upon a power-up, rendering the attack fruitless.

[6373] Rather than relying on a non-targeted attack and hoping that“just the right part of the chip is affected in just the right way”, anattacker is better off trying to introduce a targeted fault (such asoverwrite attacks, gate destruction etc.). For information on thesetargeted fault attacks, see the relevant sections below.

[6374] 16.2.12 Clock Glitch Attacks

[6375] The Clock Filter (described above) eliminates the possibility ofclock glitch attacks.

[6376] 16.2.13 Power Supply Attacks

[6377] The OverUnderPower Detection Unit (described above) eliminatesthe possibility of power supply attacks.

[6378] 16.2.14 Overwriting ROM

[6379] Authentication chips store program code, keys and secretinformation in Flash memory, and not in ROM. This attack is thereforenot possible.

[6380] 16.2.15 Modifying EEPROM/Flash

[6381] Authentication chips store program code, keys and secretinformation in multi-level Flash memory. However the Flash memory iscovered by two Tamper Prevention and Detection Lines. If either of theselines is broken (in the process of destroying a gate via a laser-cutter)the attack will be detected on power-up, and the chip will either RESET(continually) or erase the keys from Flash memory. This process isdescribed in Section 16.1.6 on page 733.

[6382] Even if an attacker is able to somehow access the bits of Flashand destroy or short out the gate holding a particular bit, this willforce the bit to have no charge or a full charge. These are both invalidstates for the authentication chip's usage of the multi-level Flashmemory (only the two middle states are valID). When that data value istransferred from Flash, detection circuitry will cause the ErasureTamper Detection Line to be triggered—thereby erasing the remainder ofFlash memory and RESETing the chIP. This is true for program code, andnon-secret information. As key data is read from multi-level flashmemory, it is not imediately checked for validity (otherwise informationabout the key is given away). Instead, a specific key validationmechanism is used to protect the secret key information.

[6383] An attacker could theoretically etch off the upper levels of thechip, and deposit enough electrons to change the state of themulti-level Flash memory by ⅓. If the beam is high enough energy itmight be possible to focus the electron beam through the TamperPrevention and Detection Lines. As a result, the authentication chipmust perform a validation of the keys before replying to the Random,Test or Random commands. The SHA-1 algorithm must be run on the keys,and the results compared against an internal checksum value. This givesan attacker a 1 in 2¹⁶⁰ chance of tricking the chip, which is the samechance as guessing either of the keys.

[6384] A Modify EEPROM/Flash attack is therefore fruitless.

[6385] 16.2.16 Gate Destruction Attacks

[6386] Gate Destruction Attacks rely on the ability of an attacker tomodify a single gate to cause the chip to reveal information duringoperation. However any circuitry that manipulates secret information iscovered by one of the two Tamper Prevention and Detection lines. Ifeither of these lines is broken (in the process of destroying a gate)the attack will be detected on power-up, and the chip will either RESET(continually) or erase the keys from Flash memory.

[6387] To launch this kind of attack, an attacker must firstreverse-engineer the chip to determine which gate(s) should be targeted.Once the location of the target gates has been determined, the attackermust break the covering Tamper Detection line, stop the Erasure of Flashmemory, and somehow rewire the components that rely on the ChipOK lines.Rewiring the circuitry cannot be done without slicing the chip, and evenif it could be done, the act of slicing the chip to this level will mostlikely destroy the charge patterns in the non-volatile memory that holdsthe keys, making the process fruitless.

[6388] 16.2.17 Overwrite Attack

[6389] An overwrite attack relies on being able to set individual bitsof the key without knowing the previous value. It relies on probing thechip, as in the conventional probing attack and destroying gates as inthe gate destruction attack. Both of these attacks (as explained intheir respective sections), will not succeed due to the use of theTamper Prevention and Detection Circuitry and ChipOK lines.

[6390] However, even if the attacker is able to somehow access the bitsof Flash and destroy or short out the gate holding a particular bit,this will force the bit to have no charge or a full charge. These areboth invalid states for the authentication chip's usage of themulti-level Flash memory (only the two middle states are valID). Whenthat data value is transferred from Flash detection circuitry will causethe Erasure Tamper Detection Line to be triggered—thereby erasing theremainder of Flash memory and RESETing the chIP. In the same way, aparity check on tampered values read from RAM will cause the ErasureTamper Detection Line to be triggered.

[6391] An overwrite attack is therefore fruitless.

[6392] 16.2.18 Memory Remanence Attack

[6393] Any working registers or RAM within the authentication chip maybe holding part of the authentication keys when power is removed. Theworking registers and RAM would continue to hold the information forsome time after the removal of power. If the chip were sliced so thatthe gates of the registers/RAM were exposed, without discharging them,then the data could probably be viewed directly using an STM.

[6394] The first defense can be found above, in the description ofdefense against power glitch attacks. When power is removed, allregisters and RAM are cleared, just as the RESET condition causes aclearing of memory.

[6395] The chances then, are less for this attack to succeed than for areading of the Flash memory. RAM charges (by nature) are more easilylost than Flash memory. The slicing of the chip to reveal the RAM willcertainly cause the charges to be lost (if they haven't been lost simplydue to the memory not being refreshed and the time taken to perform theslicing).

[6396] This attack is therefore fruitless.

[6397] 16.2.19 Chip Theft Attack

[6398] There are distinct phases in the lifetime of an authenticationchIP. Chips can be stolen when at any of these stages:

[6399] After manufacture, but before programming of key

[6400] After programming of key, but before programming of state data

[6401] After programming of state data, but before insertion into theconsumable or system

[6402] After insertion into the system or consumable

[6403] A theft in between the chip manufacturer and programming stationwould only provide the clone manufacturer with blank chips. This merelycompromises the sale of authentication chips, not anything authenticatedby the authentication chips. Since the programming station is the onlymechanism with consumable and system product keys, a clone manufacturerwould not be able to program the chips with the correct key. Clonemanufacturers would be able to program the blank chips for their ownSystems and Consumables, but it would be difficult to place these itemson the market without detection.

[6404] The second form of theft can only happen in a situation where anauthentication chip passes through two or more distinct programmingphases. This is possible, but unlikely. In any case, the worst situationis where no state data has been programmed, so all of M is read/write.If this were the case, an attacker could attempt to launch an adaptivechosen text attack on the chIP. The HMAC-SHA1 algorithm is resistant tosuch attacks. For more information see Section 14.7 on page 699.

[6405] The third form of theft would have to take place in between theprogramming station and the installation factory. The authenticationchips would already be programmed for use in a particular system or foruse in a particular consumable. The only use these chips have to a thiefis to place them into a clone System or clone Consumable. Clone systemsare irrelevant—a cloned System would not even require an authenticationchIP. For clone Consumables, such a theft would limit the number ofcloned products to the number of chips stolen. A single theft should notcreate a supply constant enough to provide clone manufacturers with acost-effective business.

[6406] The final form of theft is where the System or Consumable itselfis stolen. When the theft occurs at the manufacturer, physical securityprotocols must be enhanced. If the theft occurs anywhere else, it is amatter of concern only for the owner of the item and the police orinsurance company. The security mechanisms that the authentication chipuses assume that the consumables and systems are in the hands of thepublic. Consequently, having them stolen makes no difference to thesecurity of the keys.

[6407] 16.2.20 Trojan Horse Attack

[6408] A Trojan horse attack involves an attacker inserting a fakeauthentication chip into the programming station and retrieving the samechip after it has been programmed with the secret key information. Thedifficulty of these two tasks depends on both logical and physicalsecurity, but is an expensive attack—the attacker has to manufacture afalse authentication chip, and it will only be useful where the effortis worth the gain. For example, obtaining the secret key for a specificcar's authentication chip is most likely not worth an attacker'sefforts, while the key for a printer's ink cartridge may be veryvaluable.

[6409] The problem arises if the programming station is unable to tell aTrojan horse authentication chip from a real one—which is the problem ofauthenticating the authentication chIP.

[6410] One solution to the authentication problem is for themanufacturer to have a programming station attached to the end of theproduction line. Chips passing the manufacture QA tests are programmedwith the manufacturer's secret key information. The chip can thereforebe verified by the C1 authentication protocol, and give information suchas the expected batch number, serial number etc. The information can beverified and recorded, and the valid chip can then be reprogrammed withthe System or Consumable key and state data. An attacker would have tosubstitute an authentication chip with a Trojan horse programmed withthe manufacturer's secret key information and copied batch number datafrom the removed authentication chIP. This is only possible if themanufacturer's secret key is compromised (the key is changed regularlyand not known by a human) or if the physical security at themanufacturing plant is compromised at the end of the manufacturingchain.

[6411] Even if the solution described were to be undertaken, thepossibility of a Trojan horse attack does not go away—it merely isremoved to the manufacturer's physical location. A better solutionrequires no physical security at the manufacturing location.

[6412] The preferred solution then, is to use transparent epoxy on thechip's packaging and to image the chip before programming it. Once thechip has been mounted for programming it is in a known fixedorientation. It can therefore be high resolution photo-imaged andX-rayed from multiple directions, and the images compared against“signature”images. Any chip not matching the image signature is treatedas a Trojan horse and rejected.

[6413] 1 Refill of Ink in Printers—Printer Based Refill Device

[6414] 1.1 Functional Purpose

[6415] The functional purpose of the printer based refill device is asfollows:

[6416] To refill ink into printers by physically connecting the refilldevice to the printer.

[6417] To ensure that the correct ink is used for the correct operationof the printer (i.e. will not damage the printhead).

[6418] To ensure accurate measure of ink is transferred from therefilling device to the printer during refills.

[6419] The refill device is controlled by the printer. Apart from the QAChip¹ the refill device has no other processing power.

[6420] 1.2 Basic Components of the Refill Device

[6421]FIG. 355 shows the components of the printer based refill device.

[6422] The printer based refill device will consist of followingcomponents:

[6423] An ink reservoir—which stores the ink. Each refill device willallow ink reservoirs of various capacities. When the ink reservoirempties out, it is replaced by another reservoir containing more ink ofthe same type or different type or refilled (for example through arefill station as described in Section 2 and Section 3).

[6424] An ink output device—which dispenses ink to the printer beingrefilled when physically connected to the printer.

[6425] A QA Chip and associated circuitry—which stores the amount of inkin the reservoir along with the attributes of the ink in a digitalformat.

[6426] The electrical connections to the QA ChIP.

[6427] NB—No additional microprocessors are required to be present inthe refill device. Hence the refill device uses the processing power ofthe printer to oversee the refilling process.

[6428] An ink transfer mechanism (optional) which controls the flow inkfrom the refill device to the printer and is controlled by the printer.Therefore the control connections for the ink transfer mechanism will beconnected to the printer.

[6429] Alternatively, the ink transfer mechanism could be in theprinter. Refer to Section 1.3.

[6430] 1.3 Printer Description and Functions

[6431] Printers which will be refilled by these refilling devices musthave the following components:

[6432] Microprocessor assembly which will control the refill procedureas described Section 1.4. The microprocessor assembly will access the QAChip and ink transfer mechanism of the refill device.

[6433] A QA Chip storing the ink amount remaining in the printer.

[6434] An optional ink transfer mechanism to control the flow of inkfrom the refill device to the printer. This ink transfer mechanism mustbe present in the printer if the refill device doesn't have one of itsown.

[6435] 1.4 Operational Procedure

[6436] The operational procedure can be divided into two parts:

[6437] Refilling printers using the refill device.

[6438] Refilling of the ink reservoir in the refill device. See Section2 and Section 3.

[6439] 1.4.1 Refilling of Printers

[6440]FIG. 356 shows a printer being refilled by a printer based refilldevice. The ink transfer mechanism is located in the printer in thiscase. The ink transfer mechanism could be also located in the refilldevice as described in Section 1.2.

[6441] The following is a description for refilling of printers usingthe printer based refill device:

[6442] Ink output device from the refilling device is connected to theprinter.

[6443] The QA Chip electrical connection is connected to the printer.

[6444] The refill option is selected on the user interface of theprinter. The microprocessor assembly in the printer will then do thefollowing:

[6445] a. Read ink attributes (for example ink type, inkcharacteristics, ink colour, ink manufacturer etc) stored in the QA Chipof the ink reservoir unit. Refer to [1].

[6446] b. Compare the ink attributes as required by the printer forcorrect operation. This may require reading of data from the QA Chip inthe printer.

[6447] c. Only if Step b is successful, then do the following:

[6448] i. Determine the amount of ink to be transferred by any or all ofthe following means, ensuring that the reservoir has enough ink for thetransfer:

[6449] Fixed amount (e.g. based on a pre-programmed value or printermodel).

[6450] User-selectable amount.

[6451] ii. Decrement the amount of ink transferred from the QA Chip inthe refill station and increment the QA Chip in the printer (whichstores the amount of ink in the printer) with corresponding ink amount.

[6452] iii. Command the ink transfer mechanism to release the ink to theprinter through the output device.

[6453] 2 Home Use Refill Station

[6454] 2.1 Functional Purpose

[6455] The functional purpose of the commercial refill station is asfollows:

[6456] To refill ink into ink cartridges at home or in a small office.

[6457] Single ink cartridge is filled at a time.

[6458] To ensure that the correct ink present in the refill station istransferred to the correct ink cartridge.

[6459] To ensure accurate measure of ink is transferred from therefilling station to the ink cartridge during refills.

[6460] The refilling station provides the processing power required toperform refills of ink cartridges.

[6461] 2.2 Basic Components

[6462]FIG. 357 shows the components of a home refill station.

[6463] A home refill station will consist of one of the following inkrefill units:

[6464] A single reservoir ink refill unit suitable for black ink (or anyother single colour).

[6465] A multi reservoir ink refill unit suitable for coloured ink forexample CMY (Cyan, Magenta, Yellow).

[6466] 2.2.1 Ink Reservoir Unit

[6467]FIG. 358 shows the components of a three-ink reservoir unit.

[6468] The ink reservoir unit will consist of the following:

[6469] Multiple ink reservoirs or a single ink reservoir which storesink. Each refill station will allow ink reservoirs of variouscapacities. When the ink reservoir empties out, it is replaced byanother reservoir containing more ink of the same or different type orrefilled (for example through a refill station as described in Section3).

[6470] A QA Chip and associated circuitry in each of the inkreservoirs—which stores the amount of ink in the reservoir along withthe attributes of the ink.

[6471] The electrical connections to each of the QA Chips.

[6472] 2.2.2 Ink Transfer Unit

[6473] The ink reservoir unit will consist of the following:

[6474] Ink output device from each ink reservoir.

[6475] The output ink transfer mechanism controls the flow ink from theink refill unit to the ink cartridge and is controlled by themicroprocessor assembly.

[6476] Final ink output devices to the cartridge interface assembly

[6477] 2.2.3 Cartridge Interface Unit

[6478] This unit will provide the physical interface to the inkcartridges. Each ink cartridge interface unit will hold a single ormultiple cartridges of particular physical dimension.

[6479] The cartridge interface unit can removed from the ink refill unitand replaced with another interface unit to cater for other physicallydifferent cartridges.

[6480] 2.2.4 Microprocessor Assembly

[6481] The controls connections for the ink transfer mechanism and theelectrical connections of the QA Chip are connected to themicroprocessor assembly. The microprocessor assembly oversees andcontrols the refill process.

[6482] The microprocessor assembly will communicate with a userinterface to accept commands and provide responses for various refilloperations.

[6483] 2.3 Ink Cartridge Description

[6484] Ink cartridges which will be refilled in a home refill stationmust have a QA Chip storing the following components:

[6485] Ink amount remaining.

[6486] Ink attributes (for example—ink type, ink characteristics, inkcolour, ink manufacturer).

[6487] 2.4 Operational Procedure

[6488] The operational procedure can be divided into two parts:

[6489] Refilling of ink cartridges using the home refill station.

[6490] Refilling the ink reservoirs used in the refill station isdiscussed in Section 3.

[6491] 2.5 Refilling of Ink Cartridges Using the Home Refill Station

[6492]FIG. 359 shows the refill of ink cartridges in a home refillstation.

[6493] The following is a description for refilling of ink cartridges inthe home refill station:

[6494] Load the ink cartridge into the cartridge interface unit of theink refill unit. This will connect the QA Chip of the ink cartridge tothe microprocessor assembly. It will also connect the ink output deviceof the ink refill unit to the ink cartridge.

[6495] The model number of the ink cartridge is read from the QA Chip bythe microprocessor assembly controlling the ink refill units.

[6496] The microprocessor assembly will determine whether the ink refillunit is suitable for the ink cartridge model.

[6497] The refill option is selected on the microprocessor assemblythrough the user interface. The microprocessor assembly will then do thefollowing:

[6498] a. Read ink attributes (for example ink type, inkcharacteristics, ink colour, ink manufacturer etc) stored in the QA Chipof the ink cartridge. Refer to [1].

[6499] b. Compare the read ink attributes to the ink attribute list inthe refill station. This may also require reading of the ink attributesstored in the QA Chip of the ink reservoirs in the refill unit.

[6500] c. Only if Step b is successful, then do the following:

[6501] i. Determine the amount of ink to be transferred by any or all ofthe following means, ensuring that the reservoir has enough ink for thetransfer:

[6502] Fixed amount (e.g. based on a pre-programmed value cartridgemodel or reservoir type).

[6503] User-selectable amount.

[6504] ii. Check the ink reservoir in the ink refill unit has adequateamount of ink to refill the ink cartridge

[6505] iii. Decrement the amount of ink transferred from the QA Chip inthe ink refill unit and increment the QA Chip in the ink cartridge withcorresponding ink amount.

[6506] iv. If incrementing of the QA Chip with ink amount is successfulthen a command is sent to the ink transfer mechanism to release the inkto the ink cartridge through the output device.

[6507] 3 Commercial Refill Station

[6508] 3.1 Functional Purpose

[6509] The functional purpose of the commercial refill station is asfollows:

[6510] To refill ink into ink cartridges that are taken to the refillstation for refilling.

[6511] Multiple ink cartridges of different models can be refilled. Toensure that the correct ink present in the refill station is transferredto the ink cartridge.

[6512] To ensure accurate measure of ink is transferred from therefilling station to the ink cartridge during refills.

[6513] The refilling station provides all processing power required toperform refills of ink cartridges.

[6514] 3.2 Basic Components of the Refill Station

[6515]FIG. 360 shows the components of a commercial refill station.

[6516] A commercial refill station will consist of multiple ink refillunits controlled by a single microprocessor assembly. Each ink refillunit can refill a single ink cartridge at a time.

[6517] Each ink refill unit will consist of the following sub units:

[6518] Ink reservoir unit

[6519] Switch unit

[6520] Ink transfer unit

[6521] Multiple cartridge interface unit

[6522] 3.2.1

[6523] Ink Reservoir Unit

[6524]FIG. 361 shows the components of a ink reservoir unit.

[6525] The ink reservoir unit will consist of the following:

[6526] Multiple ink reservoirs—which stores ink. Each refill device willallow ink reservoirs of various capacities. When the ink reservoirempties out, it is replaced by another reservoir containing more ink ofthe same or different type or refilled. Refer to Section 3.5.

[6527] A QA Chip and associated circuitry in each of the inkreservoirs—which stores the amount of ink in the reservoir along withthe attributes of the ink in digital format.

[6528] The electrical connections of each of the QA Chips are connectedto the microprocessor assembly.

[6529] 3.2.2 Switch Unit

[6530] This unit will switch the inks selected from different inkreservoirs to the ink transfer unit to be dispensed into ink cartridges.

[6531] The switch unit will prevent mixing of any residual ink left indispensing devices after each ink cartridge is refilled.

[6532] 3.2.3 Ink Transfer Unit

[6533] The ink reservoir unit will consist of the following:

[6534] Ink output device from each ink reservoir.

[6535] An output ink transfer mechanism which controls the flow ink fromthe ink refill unit to the ink cartridge and is controlled by themicroprocessor assembly.

[6536] Final ink output devices to the multiple cartridge interfaceassembly

[6537] 3.2.4 Multiple Cartridge Interface Unit

[6538] This unit will provide the physical interface to the inkcartridges. Each ink cartridge interface will hold cartridges ofdifferent physical dimensions.

[6539] Each cartridge interface unit can provide an interface for about20 physically different cartridges.

[6540] The cartridge interface unit can removed from the ink refill unitand replaced with another interface unit to cater for other physicallydifferent cartridges.

[6541] 3.2.5 Microprocessor Assembly with a User Interface

[6542] The controls connections for the ink transfer mechanism and theelectrical connections of the QA Chip are connected to themicroprocessor assembly. The microprocessor assembly will oversee andcontrol the refill process.

[6543] The microprocessor assembly will communicate with a userinterface to accept commands and provide responses for various refilloperations.

[6544] 3.3 Ink Cartridge Description

[6545] Ink cartridges which will be refilled in a commercial refillstation must have a QA Chip storing the following components:

[6546] Ink amount remaining.

[6547] Ink attributes (for example—ink type, ink characteristics, inkcolour, ink manufacturer).

[6548] 3.4 Operational Procedure

[6549] The operational procedure can be divided into two parts:

[6550] Refilling of ink cartridges using the commercial refill station.

[6551] Refilling the ink reservoirs used in the refill station iscovered in Section 3.5.

[6552] 3.4.1 Refilling Ink Cartridges Using the Commercial RefillStation

[6553]FIG. 362 shows the refill of ink cartridges in a commercial refillstation.

[6554] The following is a description for refilling of ink cartridges inthe commercial refill station:

[6555] Load the ink cartridge into the multiple cartridge interface unitof the ink refill unit. This will connect the QA Chip of the inkcartridge to the microprocessor assembly. It will also connect the inkoutput device of the ink refill unit to the ink cartridge.

[6556] The model number of the ink cartridge automatically is read fromthe QA Chip by the microprocessor assembly controlling the ink refillunits.

[6557] The microprocessor assembly will determine whether the ink refillunit is suitable for the ink cartridge model.

[6558] The refill option is selected on the microprocessor assemblythrough the user interface. The microprocessor assembly will then do thefollowing:

[6559] a. Read ink attributes (for example ink type, inkcharacteristics, ink colour, ink manufacturer etc) stored in the QA Chipof the ink cartridge. Refer to [1].

[6560] b. Compare the read ink attributes to the ink attribute list inthe refill station. This may also require reading of the ink attributesstored in the QA Chip of the ink reservoirs in the refill unit.

[6561] c. Only if Step b is successful, then do the following:

[6562] i. Determine the amount of ink to be transferred by any or all ofthe following means, ensuring that the reservoir has enough ink for thetransfer:

[6563] Fixed amount (e.g. based on a pre-programmed value, cartridgemodel or reservoir type).

[6564] User-selectable amount.

[6565] ii. The microprocessor assembly will calculate the cost of inkamount and interrogate the user for a payment method—credit card orcash. If credit card option is selected it will request a credit cardnumber to be selected and interface to a payment system to complete thetransaction before proceeding further.

[6566] iii. Decrement the amount of ink transferred from the QA Chip inthe ink refill unit and increment the QA Chip in the ink cartridge withcorresponding ink amount.

[6567] iv. If incrementing of the QA Chip with ink amount is successfulthen a command is sent to the ink transfer mechanism to release the inkto the ink cartridge through the output device.

[6568] 3.5 Refilling the Ink Reservoirs

[6569] The ink reservoirs of any ink refill device can be refilledrecursively by the procedure described in Section 3.4.1, the onlyexception being the ink cartridge replaced by the ink reservoir.

[6570] 3.6 Commercial Refill Station for a Production Environment

[6571] This refill station resembles a commercial refill station butfills multiple ink cartridges of the same type at the same time. Thiswill serve as a filling station for new cartridges in a productionenvironment.

[6572] Logical Interface Specification for Preferred Form of QA Chip

[6573] 1 Introduction

[6574] This document defines the QA Chip Logical Interface, whichprovides authenticated manipulation of specific printer and consumableparameters. The interface is described in terms of data structures andthe functions that manipulate them, together with examples of use. Whilethe descriptions and examples are targetted towards the printerapplication, they are equally applicable in other domains.

[6575] 2 Scope

[6576] The document describes the QA Chip Logical Interface as follows:

[6577] data structures and their uses (Section 5 to Section 9).

[6578] functions, including inputs, outputs, signature formats, and alogical implementation sequence (Section 10 to Section 30).

[6579] typical functional sequences of printers and consumables, usingthe functions and data structures of the interface (Section 31 toSection 32).

[6580] The QA Chip Logical Interface is a logical interface, and istherefore implementation independent.

[6581] Although this document does not cover implementation details onparticular platforms, expected implementations include:

[6582] Software only

[6583] Off-the-shelf cryptographic hardware.

[6584] ASICs, such as SBR4320 [2] and SOPEC [3] for physical insertioninto printers and ink cartridges

[6585] Smart cards.

[6586] 3 Nomenclature

[6587] 3.1 Symbols

[6588] The following symbolic nomenclature is used throughout thisdocument: TABLE 246 Summary of symbolic nomenclature Symbol DescriptionF[X] Function F, taking a single parameter X F[X, Y] Function F, takingtwo parameters, X and Y X | Y X concatenated with Y X

Y Bitwise X AND Y X

Y Bitwise X OR Y (inclusive-OR) X ⊕ Y Bitwise X XOR Y (exclusive-OR)

X Bitwise NOT X (complement) X

Y X is assigned the value Y X

{Y, Z} The domain of assignment inputs to X is Y and Z X = Y X is equalto Y X ≠ Y X is not equal to Y

X Decrement X by 1 (floor 0)

X Increment X by 1 (modulo register length) Erase X Erase Flash memoryregister X SetBits[X, Y] Set the bits of the Flash memory register Xbased on Y Z

ShiftRight[X, Y] Shift register X right one bit position, taking inputbit from Y and placing the output bit in Z a.b Data field or memberfunction ‘b’ in object a.

[6589] 3.2 Pseudocode

[6590] 3.2.1 Asynchronous

[6591] The following pseudocode:

[6592] var=expression

[6593] means the var signal or output is equal to the evaluation of theexpression.

[6594] 3.2.2 Synchronous

[6595] The following pseudocode:

[6596] var←expression

[6597] means the var register is assigned the result of evaluating theexpression during this cycle.

[6598] 3.2.3 Expression

[6599] Expressions are defined using the nomenclature in Table 246above. Therefore:

[6600] var=(a=b)

[6601] is interpreted as the var signal is 1 if a is equal to b, and 0otherwise.

[6602] 4 Terms

[6603] 4.1 QA Device and System

[6604] An instance of a QA Chip Logical Interface (on any platform) is aQA Device.

[6605] QA Devices cannot talk directly to each other. A System is alogical entity which has one or more QA Devices connected logically (orphysically) to it, and calls the functions on the QA Devices. The systemis considered secure and the program running on the system is consideredto be trusted.

[6606] 4.2 Types of QA Devices

[6607] 4.2.1 Trusted QA Device

[6608] The Trusted QA Device forms an integral part of the system itselfand resides within the trusted environment of the system. It enables thesystem to extend trust to external QA Device s. The Trusted QA Device isonly trusted because the system itself is trusted.

[6609] 4.2.2 External Untrusted QA Device

[6610] The External untrusted QA Device is a QA Device that residesexternal to the trusted environment of the system and is thereforeuntrusted. The purpose of the QA Chip Logical Interface is to allow theexternal untrusted QA Devices to become effectively trusted. This isaccomplished when a Trusted QA Device shares a secret key with theexternal untrusted QA Device, or with a Translation QA Device (seebelow).

[6611] In a printing application external untrusted QA Devices wouldtypically be instances of SBR4320 implementations located in aconsumable or the printer.

[6612] 4.2.3 Translation QA Device

[6613] A Translation QA Device is used to translate signatures betweenQA Devices and extend effective trust when secret keys are not directlyshared between QA Devices.

[6614] The Translation QA Device must share a secret key with theTrusted QA Device that allows the Translation QA Device to effectivelybecome trusted by the Trusted QA Device and hence trusted by the system.The Translation QA Device shares a different secret key with anotherexternal untrusted QA Device (which may in fact be a Translation QADevice etc). Although the Trusted QA Device doesn't share (know) the keyof the external untrusted QA Device, signatures generated by thatuntrusted device can be translated by the Translation QA Device intosignatures based on the key that the Trusted QA Device does know, andthus extend trust to the otherwise untrusted external QA Device.

[6615] In a SoPEC-based printing application, the Printer QA Device actsas a Translation QA Device since it shares a secret key with the SoPEC,and a different secret key with the ink carridges.

[6616] 4.2.4 Consumable QA Device

[6617] A Consumable QA Device is an external untrusted QA Device locatedin a consumable. It typically contains details about the consumable,including how much of the consumable remains.

[6618] In a printing application the consumable QA Device is typicallyfound in an ink cartridge and is referred to as an Ink QA Device, orsimply Ink QA since ink is the most common consumable for printingapplications. However, other consumables in printing applicationsinclude media and impression counts, so consumable QA Device is moregeneric.

[6619] 4.2.5 Printer QA Device

[6620] A Printer QA Device is an external untrusted device located inthe printer. It contains details about the operating parameters for theprinter, and is often referred to as a Printer QA.

[6621] 4.2.6 Value Upgrader QA Device

[6622] A Value Upgrader QA Device contains the necessary functions toallow a system to write an initial value (e.g. an ink amount) intoanother QA Device, typically a consumable QA Device. It also allows asystem to refill/replenish a value in a consumable QA Device after use.

[6623] Whenever a value upgrader QA Device increases the amount of valuein another QA Device, the value in the value upgrader QA Device iscorrespondingly decreased. This means the value upgrader QA Devicecannot create value—it can only pass on whatever value it itself hasbeen issued with. Thus a value upgrader QA Device can itself bereplenished or topped up by another value upgrader QA Device.

[6624] An example of a value upgrader is an Ink Refill QA Device, whichis used to fill/refill ink amount in an Ink QA Device.

[6625] 4.2.7 Parameter Upgrader QA Device

[6626] A Parameter Upgrader QA Device contains the necessary functionsto allow a system to write an initial parameter value (e.g. a printspeed) into another QA Device, typically a printer QA Device. It alsoallows a system to change that parameter value at some later date.

[6627] A parameter upgrader QA Device is able to perform a fixed numberof upgrades, and this number is effectively a consumable value. Thus thenumber of available upgrades decreases by 1 with each upgrade, and canbe replenished by a value upgrader QA Device.

[6628] 4.2.8 Key Programmer QA Device

[6629] Secret batch keys are inserted into QA Devices duringinstantiation (e.g. manufacture). These keys must be replaced by thefinal secret keys when the purpose of the QA Device is known. The KeyProgrammer QA Device implements all necessary functions for replacingkeys in other QA Devices.

[6630] 4.3 Signature

[6631] Digital signatures are used throughout the authenticationprotocols of the QA Chip Logical Interface.

[6632] A signature is produced by passing data plus a secret key througha keyed hash function. The signature proves that the data was signed bysomeone who knew the secret key.

[6633] The signature function used throughout the QA Chip LogicalInterface is HMAC-SHA1 [1].

[6634] 4.3.4 Authenticated Read

[6635] This is a read of data from a non-trusted QA Device that alsoincludes a check of the signature (see Section 4.3.3). When the Systemdetermines that the signature is correct for the returned data (e.g. byasking a trusted QA Device to test the signature) then the System isable to trust that the data has not been tampered en route from theread, and was actually stored on the non-trusted QA Device.

[6636] 4.3.5 Authenticated Write

[6637] An authenticated write is a write to the data storage area in aQA Device where the write request includes both the new data and asignature. The signature is based on a key that has write accesspermissions to the region of data in the QA Device, and proves to thereceiving QA Device that the writer has the authority to perform thewrite. For example, a Value Upgrader Refilling Device is able toauthorize a system to perform an authenticated write to upgrade aConsumable QA Device (e.g. to increase the amount of ink in an Ink QADevice).

[6638] The QA Device that receives the write request checks that thesignature matches the data (so that it hasn't been tampered with enroute) and also that the signature is based on the correct authorizationkey.

[6639] An authenticated write can be followed by an authenticated readto ensure (from the system's point of view) that the write wassuccessful.

[6640] 4.3.6 Non-Authenticated Write

[6641] A non-authenticated write is a write to the data storage area ina QA Device where the write request includes only the new data (and nosignature). This kind of write is used when the system wants to updateareas of the QA Device that have no access-protection.

[6642] The QA Device verifies that the destination of the write requesthas access permissions that permit anyone to write to it. If access ispermitted, the QA Device simply performs the write as requested. Anon-authenticated write can be followed by an authenticated read toensure (from the system's point of view) that the write was successful.

[6643] 4.3.7 Authorized Modification of Data

[6644] Authorized modification of data refers to modification of datavia authenticated writes (see Section 4.3.5).

[6645] Table 2 provides a summary of the data structures used in the QAChip Logical Interface. TABLE 2 List of data structures GroupRepresented description Name by Size Description QA Device ChipIdentifier ChipId 48 bits Unique identifier for this QA Device. instanceidentifier Key and key Number of Keys NumKeys 8 Number of key slotsavailable in this QA Device. related data Key K 160 bits per key K isthe secret key used for calculating signatures. K^(n) is the key storedin the nth key slot. Key Identifier KeyId 31 bits per key Uniqueidentifier for each key KeyId^(n) is the key identifier for the keystored in slot n. KeyLock KeyLock 1 bit per key Flag indicates whetherthe key is locked in the corresponding slot or not. KeyLock^(n) is thekey lock flag for slot n. Operating and Number of NumVectors 4 Number of512 bit memory vectors in this QA Device. state data Memory VectorsMemory Vector M 512 bits per M M is a 512 bit memory vector. The 512-bitvector is divided into 16 × 32 bit words. M⁰ M⁰ stores applicationspecific dab that is protected by access permissions for key-based andnon-key based writes. M¹ M¹ stores the attributes for M⁰, and iswrite-once-only. M²⁺ M²⁺ stores application specific data that isprotected only by non key-based access permissions. Permissions P^(n) 16bits per P Access permissions for each word ofM¹⁺. n = number ofM¹⁺vectors Session data Random Number R 160 bits Current random numberused to ensure time varying messages. Changes after each successfulauthentication or signature generation.

[6646] 6 Instance/Device Identifier

[6647] Each QA Device requires an identifier that allows uniqueidentification of that QA Device by external systems, ensures thatmessages are received by the correct QA Device, and ensures that thesame device can be used across multiple transactions.

[6648] Strictly speaking, the identifier only needs to be unique withinthe context of a key, since QA Devices only accept messages that areappropriately signed. However it is more convenient to have the instanceidentifier completely unique, as is the case with this design.

[6649] The identifier functionality is provided by ChipId.

[6650] 6.1 ChipId

[6651] ChipId is the unique 64-bit QA Device identifier. The ChipId isset when the QA Device is instantiated, and cannot be changed during thelifetime of the QA Device.

[6652] A 64-bit ChipId gives a maximum of 1844674 trillion unique QADevices.

[6653] 7 Key and Key Related Data

[6654] 7.1 NumKeys, K, keyID, and KeyLock

[6655] Each QA Device contains a number of secret keys that are used forsignature generation and verification. These keys serve two basicfunctions:

[6656] For reading, where they are used to verify that the read datacame from the particular QA Device and was not altered en route.

[6657] For writing, where they are used to ensure only authorisedmodification of data.

[6658] Both of these functions are achieved by signature generation; akey is used to generate a signature for subsequent transmission from thedevice, and to generate a signature to compare against a receivedsignature.

[6659] The number of secret keys in a QA Device is given by NumKeys. Forthis version of the QA Chip Logical Interface, NumKeys has a maximumvalue of 8.

[6660] Each key is referred to as K, and the subscripted form K_(n)refers to the nth key where n has the range 0 to NumKeys-1 (i.e. 0 to7). For convenience we also refer to the nth key as being the key in thenth keyslot.

[6661] The length of each key is 160-bits. 160-bits was chosen becausethe output signature length from the signature generation function(HMAC-SHA1) is 160 bits, and a key longer than 160-bits does not add tothe security of the function.

[6662] The security of the digital signatures relies upon keys beingkept secret. To safeguard the security of each key, keys should begenerated in a way that is not deterministic. Ideally each key should beprogrammed with a physically generated random number, gathered from aphysically random phenomenon. Each key is initially programmed during QADevice instantiation.

[6663] Since all keys must be kept secret and must never leave the QADevice, each key has a corresponding 31-bit KeyId which can be read todetermine the identity or label of the key without revealing the valueof the key itself. Since the relationship between keys and KeyIds is1:1, a system can read all the KeyIds from a QA Device and know whichkeys are stored in each of the keyslots.

[6664] Finally, each keyslot has a corresponding 1-bit KeyLock statusindicating whether the key in that slot/position is allowed to bereplaced (securely replaced, and only if the old key is known). Once akey has been locked into a slot, it cannot be unlocked i.e. it is thefinal key for that slot. A key can only be used to perform authenticatedwrites of data when it has been locked into its keyslot (i.e. itsKeyLock status=1). Refer to Section 8.1.1.5 for further details.

[6665] Thus each of the NumKeys keyslots contains a 160-bit key, a31-bit KeyID, and a 1-bit KeyLock.

[6666] 7.2 Common and Variant Signature Generation

[6667] To create a digital signature, we pass the data to be signedtogether with a secret key through a key dependent one-way hashfunction. The key dependent one-way hash function used throughout the QAChip Logical Interface is HMAC-SHA1[1].

[6668] Signatures are only of use if they can be validated i.e. QADevice A produces a signature for data and QA Device B can check if thesignature was valid for that particular data. This implies that A and Bmust share some secret information so that they can generate equivalentsignatures.

[6669] Common key signature generation is when QA Device A and QA DeviceB share the exact same key i.e. key K_(A)=key K_(B). Thus the signaturefor a message produced by A using K_(A) can be equivalently produced byB using K_(B). In other words SIG_(KA)(message)=SIG_(KB)(message)because key K_(A)=key K_(B).

[6670] Variant key signature generation is when QA Device B holds a basekey, and QA Device A holds a variant of that key such thatK_(A)=owf(K_(B),U_(A)) where owf is a one-way function based upon thebase key (K_(B)) and a unique number in A (U_(A)). Thus A can produceSIG_(KA)(message), but for B to produce an equivalent signature it mustproduce K_(A) by reading U_(A) from A and using its base key K_(B).K_(A) is referred to as a variant key and K_(B) is referred to as thebase/common key. Therefore, B can produce equivalent signatures frommany QA Devices, each of which has its own unique variant of K_(B).Since ChipId is unique to a given QA Device, we use that as U_(A). Aone-way function is required to create K_(A) from K_(B) or it would bepossible to derive K_(B) if K_(A) were exposed.

[6671] Common key signature generation is used when A and B are equallyavailable¹ to an attacker. For example, Printer QA Devices and Ink QADevices are equally available to attackers (both are commonly availableto an attacker), so shared keys between these two devices should becommon keys.

[6672] Variant key signature generation is used when B is not readilyavailable to an attacker, and A is readily available to an attacker. Ifan attacker is able to determine K_(A), they will not know K_(A) for anyother QA Device of class A, and they will not be able to determineK_(B).

[6673] The QA Device producing or testing a signature needs to know ifit must use the common or variant means of signature generation.Likewise, when a key is stored in a QA Device, the status of the key(whether it is a base or variant key) must be stored along with it forfuture reference. Both of these requirements are met using the KeyId asfollows:

[6674] The 31-bit KeyId is broken into two parts:

[6675] A 30-bit unique identifier for the key. Bits 30-1 represents theId.

[6676] A 1-bit Variant Flag, which represents whether the key is a basekey or a variant key. Bit 0. represents the Variant Flag.

[6677] Table 247 describes the relationship of the Variant Flag with thekey. TABLE 247 Variant Flag representation value Key represented 0 Basekey 1 Variant key

[6678] 7.2.1 Equivalent Signature Generation Between QA Devices

[6679] Equivalent signature generation between 4 QA Devices A, B, C andD is shown in FIG. 363. Each device has a single key. KeyId.Id of allfour keys are the same i.eKeyId_(A).Id=KeyId_(B).Id=KeyId_(C).Id=KeyId_(D).Id.

[6680] If KeyId_(A). VariantFlag=0 and KeyId_(B). VariantFlag=0, then asignature produced by A, can be equivalently produced by B becauseK_(A)=K_(B).

[6681] If KeyId_(B). VariantFlag=0 and KeyId_(C). VariantFlag=1, then asignature produced by C, is equivalently produced by B because K_(C)=f(K_(B), ChipId_(C)).

[6682] If KeyId_(C). VariantFlag=1 and KeyId_(D). VariantFlag=1, then asignature produced by C, cannot be equivalently produced by D becausethere is no common base key between the two devices.

[6683] If KeyId_(D). VariantFlag=1 and KeyId_(A). VariantFlag=0, then asignature produced by D, can be equivalently produced by A becauseK_(D)=f (K_(A), ChipId_(D)).

[6684] 8 Operating and State Data

[6685] The primary purpose of a QA Device is to securely holdapplication-specific data. For example if the QA Device is an Ink QADevice it may store ink characteristics and the amount of ink-remaining.If the QA Device is a Printer QA Device it may store the maximum speedand width of printing.

[6686] For secure manipulation of data:

[6687] Data must be clearly identified (includes typing of data).

[6688] Data must have clearly defined access criteria and permissions.

[6689] The QA Chip Logical Interface contains structures to permit theseactivities.

[6690] The QA Device contains a number of kinds of data with differingaccess requirements:

[6691] Data that can be decremented by anyone, but only increased in anauthorised fashion e.g. the amount of ink-remaining in an ink cartridge.

[6692] Data that can only be decremented in an authorised fashion e.g.the number of times a Parameter Upgrader QA Device has upgraded anotherQA Device.

[6693] Data that is normally read-only, but can be written to (changed)in an authorised fashion e.g. the operating parameters of a printer.

[6694] Data that is always read-only and doesn't ever need to be changede.g. ink attributes or the serial number of an ink cartridge or printer.

[6695] Data that is written by QACo/Silverbrook, and must not be changedby the OEM or end user e.g. a licence number containing the OEM'sidentification that must match the software in the printer.

[6696] Data that is written by the OEM and must not be changed by theend-user e.g. the machine number that filled the ink cartridge with ink(for problem tracking).

[6697] 8.1 M

[6698] M is the general term for all of the memory (or data) in a QADevice. M is further subscripted to refer to those different parts of Mthat have different access requirements as follows:

[6699] M₀ contains all of the data that is protected by accesspermissions for key-based (authenticated) and non-key-based(non-authenticated) writes.

[6700] M₁ contains the type information and access permissions for theM₀ data, and has write-once permissions (each sub-part of M₁ can only bewritten to once) to avoid the possibility of changing the type or accesspermissions of something after it has been defined.

[6701] M₂, M₃ etc., referred to as M₂₊, contains all the data that canbe updated by anyone until the permissions for those sub-parts of M₂₊have changed from read/write to read-only.

[6702] While all QA Devices must have at least M₀ and M₁, the exactnumber of memory vectors (M_(n)s) available in a particular QA Device isgiven by NumVectors. In this version of the QA Chip Logical Interfacethere are exactly 4 memory vectors, so NumVectors=4.

[6703] Each M_(n) is 512 bits in length, and is further broken into16×32 bit words. The ith word of M_(n) is referred to as M_(n)[i].M_(n)[0] is the least significant word of M_(n), and M_(n)[15] is themost significant word of M_(n).

[6704] 8.1.1 M₀ and M₁

[6705] In the general case of data storage, it is up to the externalaccessor to interpret the bits in any way it wants. Data structures canbe arbitrarily arranged as long as the various pieces of software andhardware that interpret those bits do so consistently. However if thosebits have value, as in the case of a consumable, it is vital that thevalue cannot be increased without appropriate authorisation, or one typeof value cannot be added to another incompatible kind e.g. dollarsshould never be added to yen.

[6706] Therefore M₀ is divided into a number of fields, where each fieldhas a size, a position, a type and a set of permissions. M₀ contains allof the data that requires authenticated write access (one data elementper field), and M₁ contains the field information i.e. the size, typeand access permissions for the data stored in M₀.

[6707] Each 32-bit word of M₁ defines a field. Therefore there is amaximum of 16 defined fields. M₁[0] defines field 0, M₁[1] defines field1 and so on. Each field is defined in terms of:

[6708] size and position, to permit external accessors determine where adata item is

[6709] type, to permit external accessors determine what the datarepresents

[6710] permissions, to ensure approriate access to the field by externalaccessors.

[6711] The 32-bit value M₁[n] defines the conceptual field attributesfor field n as follows: With regards to consistency of interpretation,the type, size and position information stored in the various words ofM₁ allows a system to determine the contents of the corresponding fields(in M₀) held in the QA Device. For example, a 3-color ink cartridge mayhave an Ink QA Device that holds the amount of cyan ink in field 0, theamount of magenta ink in field 1, and the amount of yellow ink in field2, while another single-color Ink QA Device may hold the amount ofyellow ink in field 0, where the size of the fields in the two Ink QADevices are different.

[6712] A field must be defined (in M₁) before it can be written to (inM₀). At QA Device instantiation, the whole of M₀ is 0 and no fields aredefined (all of M₁ is 0). The first field (field 0) can only be createdby writing an appropriate value to M₁[0]. Once field 0 has been defined,the words of M₀ corresponding to field 0 can be written to (via theappropriate permissions within the field definition M₁[0]).

[6713] Once a field has been defined (i.e. M₁[n] has been written to),the size, type and permissions for that field cannot be changed i.e. M₁is write-once. Otherwise, for example, a field could be defined to belira and given an initial value, then the type changed to dollars.

[6714] The size of a field is measured in terms of the number ofconsecutive 32-bit words it occupies.

[6715] Since there are only 16×32-bit words in M₀, there can only be 16fields when all 16 fields are defined to be 1 word sized each. Likewise,the maximum size of a field is 512 bits when only a single field isdefined, and it is possible to define two fields of 256-bits each.

[6716] Once field 0 has been created, field 1 can be created, and so on.When enough fields have been created to allocate all of M₀, theremaining words in M₁ are available for write-once general data storagepurposes.

[6717] It must be emphasised that when a field is created thepermissions for that field are final and cannot be changed. This alsomeans that any keys referred to by the field permissions must be alreadylocked into their keyslots. Otherwise someone could set up a field'spermissions that the key in a particular keyslot has write access tothat field without any guarantee that the desired key will be everstored in that slot (thus allowing potential mis-use of the field'svalue).

[6718] 8.1.1.1 Field Size and Position

[6719] A field's size and position are defined by means of 4 bits(referred to as EndPos) that point to the least significant word of thefield, with an implied position of the field's most significant word.The implied position of field 0's most significant word is M₀[15]. Thepositions and sizes of all fields can therefore be calculated bystarting from field 0 and working upwards until all the words of M₀ havebeen accounted for.

[6720] The default value of M₁[0] is 0, which means field0.endPos=0.Since field0.startPos=15, field 0 is the only field and is 16 wordslong.

8.1.1.1.1 EXAMPLE

[6721] Suppose for example, we want to allocate 4 fields as follows:

[6722] field 0: 128 bits (4×32-bit words)

[6723] field 1: 32 bits (1×32-bit word)

[6724] field 2: 160 bits (5×32-bit words)

[6725] field 3: 192 bits (6×32-bit words)

[6726] Field 0's position and size is defined by M₁[0], and has anassumed start position of 15, which means the most significant word offield 0 must be in M₀[15]. Field 0 therefore occupies M₀[12] through toM₀[15], and has an endPos value of 12.

[6727] Field 1's position and size is defined by M₁[1], and has anassumed start position of 11 (i.e. M₁[0].endPos-1). Since it has alength of 1 word, field 1 therefore occupies only M₀[11] and its endposition is the same as its start position i.e. its endPos value is 11.

[6728] Likewise field 2's position and size is defined by M₁[2], and hasan assumed start position of 10 (i.e. M₁[1].endPos-1). Since it has alength of 5 words, field 2 therefore occupies M₀[6] through to M₀[10]and and has an endPos value of 6.

[6729] Finally, field 3's position and size is defined by M₁[3], and hasan assumed start position of 5 (i.e. M₁[2].endPos-1). Since it has alength of 6 words, field 3 therefore occupies M₀[5] through to M₀[0] andand has an endPos value of 0.

[6730] Since all 16 words of M₀ are now accounted for in the 4 fields,the remaining words of M₁ (i.e. M₁[4] though to M₁[15]) are ignored, andcan be used for any write-once (and thence read-only) data.

[6731]FIG. 365 shows the same example in diagramatic format.

[6732] 8.1.1.1.2 Determining the Number of Fields

[6733] The following pseudocode illustrates a means of determining thenumber of fields: fieldNum FindNumFields(M1) startPos

15 fieldNum

0 While (fieldNum < 16) endPos

M1[fieldNum] .endPos If (endPos > startPos) # error in this field... somust be an attack attackDetected( ) # most likely clears all keys anddata EndIf fieldNum++ If (endPos = 0) return fieldNum # is alreadyincremented Else startPos

endPos − 1 # endpos must be > 0 EndIf EndWhile # error if get here since16 fields are consumed in 16 words at most attackDetected( ) # mostlikely clears all keys and data

[6734] 8.1.1.1.3 Determining the Sizes of All Fields

[6735] The following pseudocode illustrates a means of determing thesizes of all valid fields: FindFieldSizes(M1, fieldSize[ ]) numFields

FindNumFields (M1) # assumes that FindNumFields does all checkingntartPos

15 fieldNum

0 While (fieldNum < numFields) EndPos

M1[fieldNum].endPos fieldSize[fieldNum] = startPos − endPos + 1 startPos

endPos − 1  # endpos must be > 0 fieldNum++ EndWhile While (fieldNum <16) fieldSize[fieldNum]

0 fieldNum++ EndWhile

[6736] 8.1.1.2 Field Type

[6737] The system must be able to identify the type of data stored in afield so that it can perform operations using the correct data. Forexample, a printer system must be able identify which of a consumable'sfields are ink fields (and which field is which ink) so that the inkusage can be correctly applied during printing.

[6738] A field's type is defined by 15 bits. Table 332 in Appendix Alists the field types that are specifically required by the QA ChipLogical Interface and therefore apply across all applications.

[6739] The default value of M₁[0] is 0, which means field0.type=0 (i.e.non-initialised).

[6740] Strictly speaking, the type need only be interpreted by all whocan securely read and write to that field i.e. within the context of oneor more keys. However it is convenient if possible to keep all typesunique for simplistic identification of data across all applications.

[6741] In the general case, an external system communicating with a QADevice can identify the data stored in M0 in the following way:

[6742] Read the KeyId of the key that has permission to write to thefield. This will a give broad identification of the data type, which maybe sufficient for certain applications.

[6743] Read the type attribute for the field to narrow down the identitywithin the broader context of the KeyId.

[6744] For example, the printer system can read the KeyId to deduce thatthe data stored in a field can be written to via theHP_Network_InkRefill key, which means that any data is of the generalink category known to HP Network printers. By further reading the typeattribute for the field the system can determine that the ink is Blackink.

[6745] 8.1.1.3 Field Permissions

[6746] All fields can be ready by everyone. However writes to fields aregoverned by 13-bits of permissions that are present in each field'sattribute definition. The permissions describe who can do what to aspecific field.

[6747] Writes to fields can either be authenticated (i.e. the data to bewritten is signed by a key and this signature must be checked by thereceiving device before write access is given) or non-authenticated(i.e. the data is not signed by a key). Therefore we define a single bit(AuthRW) that specifies whether authenticated writes are permitted, anda single bit (NonAuthRW) specifying whether non-authenticated writes arepermitted. Since it is pointless to permit both authenticated andnon-authenticated writes to write any value (the authentciated writesare pointless), we further define the case when both bits are set to beinterpreted as authenticated writes are permitted, but non-authenticatedwrites only succeed when the new value is less than the previous valuei.e. the permission is decrement-only. The interpretation of these twobits is shown in Table 249. TABLE 249 Interpretation of AuthRW andNonAuthRW NonAuthRW AuthRW Interpretation 0 0 Read-only access (no-onecan write to this field). This is the initial state for each field. Atinstantiation all of M₁ is 0 which means AuthRW and NonAuthRW are 0 foreach field, and hence none of M₀ can be written to until a field isdefined. 0 1 Authenticated write access is permitted Non-authenticatedwrite acecss is not permitted 1 0 Authenticated write access is notpermitted Non-authenticated write access is permitted (i.e. anyone canwrite to this field) 1 1 Authenticated write access is permittedNon-authenticated write access is decrement-only.

[6748] If authenticated write access is permitted, there are 11additional bits (bringing the total number of permission bits to 13) tomore fully describe the kind of write access for each key. We onlypermit a single key to have the ability to write any value to the field,and the remaining keys are defined as being either not permitted towrite, or as having decrement-only write access. A 3-bit KeyNumrepresents the slot number of the key that has the ability to write anyvalue to the field (as long as the key is locked into its key slot), andan 8-bit KeyPerms defines the write permissions for the (maximum of) 8keys as follows:

[6749] KeyPerms[n]=0: The key in slot n (i.e. K_(n)) has no write accessto this field (except when n=KeyNum). Setting KeyPerms to 0 prohibits akey from transferring value (when an amount is deducted from field inone QA Device and transferred to another field in a different QA Device)

[6750] KeyPerms[n]=1: The key in slot n (i.e. K_(n)) is permitted toperform decrement-only writes to this field (as long as K_(n) is lockedin its key slot). Setting KeyPerms to 1 allows a key to transfer value(when an amount is deducted from field in one QA Device and transferredto another field in a different QA Device).

[6751] The 13-bits of permissions (within bits 4-16 of M₁[n]) areallocated as follows:

8.1.1.3.1 Example 1

[6752]FIG. 367 shows an example of permission bits for a field.

[6753] In this example we can see:

[6754] NonAuthRW=0 and AuthRW=1, which means that only authenticatedwrites are allowed i.e. writes to the field without an appropriatesignature are not permitted.

[6755] KeyNum=3, so the only key permitted to write any value to thefield is key 3 (i.e. K₃).

[6756] KeyPerms[3]=0, which means that although key 3 is permitted towrite to this field, key 3 can't be used to transfer value from thisfield to other QA Devices.

[6757] KeyPerms[0,4,5,6,7]=0, which means that these respective keyscannot write to this field.

[6758] KeyPerms[1,2]=1, which means that keys 1 and 2 havedecrement-only access to this field i.e. they are permitted to write anew value to the field only when the new value is less than the currentvalue.

8.1.1.3.2 Example 2

[6759]FIG. 368 shows a second example of permission bits for a field.

[6760] In this example we can see:

[6761] NonAuthRW and AuthRW=1, which means that authenticated writes areallowed and writes to the field without a signature are only permittedwhen the new value is less than the current value (i.e.non-authenticated writes have decrement-only permission).

[6762] KeyNum=3, so the only key permitted to write any value to thefield is key 3 (i.e. K₃).

[6763] KeyPerms[3]=1, which means that key 3 is permitted to write tothis field, and can be used to transfer value from this field to otherQA Devices.

[6764] KeyPerms[0,4,5,6,7]=0, which means that these respective keyscannot write to this field.

[6765] KeyPerms[1,2]=1, which means that keys 1 and 2 havedecrement-only access to this field i.e. they are permitted to write anew value to the field only when the new value is less than the currentvalue.

[6766] 8.1.1.4 Summary of Field Attributes

[6767]FIG. 369 shows the breakdown of bits within the 32-bit fieldattribute value M₁[n]. Table 250 summarises each attribute. TABLE 250Attributes for a field Sub- attribute Size Attribute name in bitsInterpretation Type Type 15 Gives additional identification of the datastored in the field within the context of the accessors of that field.Permissions KeyNum 3 The slot number of the key that has authenticatedwrite access to the field. NonAuthRW 1 0 = non-authenticated writes arenot permitted to this field. 1 = non-authenticated writes are permittedto this field (see Table 249). AuthRW 1 0 = authenticated writes are notpermitted to this field. 1 = authenticated writes are permitted to thisfield. KeyPerms 8 Bitmap representing the write permissions for each ofthe keys when AuthRW = 1. For each bit: 0 = no write access for this key(except for key KeyNum) 1 = decrement-only access is permitted for thiskey. Size and EndPos 4 The word number in M₀ that holds Position the lswof the field. The msw is held in M1[fieldNum − 1], where msw of field 0is 15.

[6768] 8.1.1.5 Permissions of M₁

[6769] M₁ holds the field attributes for data stored in M₀, and eachword of M₁ can be written to once only.

[6770] It is important that a system can determine which words areavailable for writing. While this can be determined by reading M₁ anddetermining which of the words is non-zero, a 16-bit permissions valueP₁ is available, with each bit indicating whether or not a given word inM₁ has been written to. Bit n of P₁ represents the permissions for M₁[n]as follows: TABLE 251 Interpretation of P₁[n] i.e. bit n of M₁'spermission Description 0 writes to M₁[n] are not permitted i.e. thisword is now read-only 1 writes to M₁[n] are permitted

[6771] Since M₁ is write-once, whenever a word is written to in M₁, thecorresponding bit of P₁ is also cleared, i.e. writing to M₁[n] clearsP₁[n].

[6772] Writes to M₁[n] only succeed when all of M₁[0...n−1] have alreadywritten to (i.e. previous fields are defined) i.e.

[6773] M₁[0..n−1] must have already been written to (i.e. P₁[0..n−1] are0)

[6774] P₁[n]=1 (i.e. it has not yet been written to)

[6775] In addition, if M₁[n−1].endPos≠0, the new M₁[n] word will definethe attributes of field n, so must be further checked as follows:

[6776] The new M₁[n].endPos must be valid (i.e. must be less thanM₁[n−1].endPos)

[6777] If the new M₁[n].authRW is set, K_(keyNum) must be locked, andall keys referred to by the new M₁[n].keyPerms must also be locked.

[6778] However if M₁[n−1].endPos=0, then all of M₀ has been defined interms of fields. Since enough fields have been created to allocate allof M₀, any remaining words in M₁ are available for write-once generaldata storage purposes, and are not checked any further.

[6779] 8.1.2 M2+

[6780] M₂, M₃ etc., referred to as M₂+, contains all the data that canbe updated by anyone (i.e. no authenticated write is required) until thepermissions for those sub-parts of M₂₊ have changed from read/write toread-only.

[6781] The same permissions representation as used for M₁ is also usedfor M₂₊. Consequently P_(n) is a 16-bit value that contains thepermissions for M_(n) (where n>0). The permissions for word w of M_(n)is given by a single bit P_(n)[w]. However, unlike writes to M₁, writesto M₂₊ do not automatically clear bits in P. Only when the bits in P₂₊are explictly cleared (by anyone) do those corresponding words becomeread-only and final.

[6782] 9 Session Data

[6783] Data that is valid only for the duration of a particularcommunication session is referred to as session data. Session dataensures that every signature contains different data (sometimes referredto as a nonce) and this prevents replay attacks.

[6784] 9.1 R

[6785] R is a 160-bit random number seed that is set up (when the QADevice is instantiated) and from that point on it is internally managedand updated by the QA Device. R is used to ensure that each signed itemcontains time varying information (not chosen by an attacker), and eachQA Device's R is unrelated from one QA Device to the next.

[6786] This R is used in the generation and testing of signatures.

[6787] An attacker must not be able to deduce the values of R in presentand future devices. Therefore, R should be programmed with acryptographically strong random number, gathered from a physicallyrandom phenomenon (must not be deterministic).

[6788] 9.2 Advancing R

[6789] The session component of the message must only last for a singlesession (challenge and response).

[6790] The rules for updating R are as follows:

[6791] Reads of R do not advance R.

[6792] Everytime a signature is produced with R, R is advanced to a newrandom number.

[6793] Everytime a signature including R is tested and is found to becorrect, R is advanced to a new random number.

[6794] 9.3 R_(L) and R_(E)

[6795] Each signature contains 2 pieces of session data i.e. 2 Rs:

[6796] One R comes from the QA Device issuing the challenge i.e. thechallenger. This is so the challenger can ensure that the challenged QADevice isn't simply replaying an old signature i.e. the challenger isprotecting itself against the challenged.

[6797] One R comes from the device responding to the challenge i.e. thechallenged. This is so the challenged never signs anything that is givento it without inserting some time varying change i.e. protects thechallenged from the challenger in case the challenger is actually anattacker performing a chosen text attack

[6798] Since there are two Rs, we need to distinguish between them. Wedo so by defining each R as external (R_(E)) or local (R_(L)) dependingon its use in a given function. For example, the challenger sends outits local R, referred to as R_(L). The device being challenged receivesthe challenger's R as an external R, i.e R_(E). It then generates asignature using its R_(L) and the challenger's R_(E). The resultantsignature and R_(L) are sent to the challenger as the response. Thechallenger receives the signature and R_(E) (signature and R_(L)produced by the device being challenged), produces its own signatureusing R_(L) (sent to the device being challenged earlier) and R_(E)received, and compares that signature to the signature received asresponse.

[6799] Signature Functions

[6800] 10 Objects

[6801] 10.1 KeyRef

[6802] 10.1.1 Object Description

[6803] Instead of passing keys directly into a function, a KeyRef (i.e.key reference) object is passed instead. A KeyRef object encapsulatesthe process by which a key is formed for common and variant forms ofsignature generation (based on the setting of the variables within theobject). A KeyRef defines which key to use, whether it is a common orvariant form of that key, and, if it is a variant form, the ChipId touse to create the variant. For more information about common and variantforms of keys, see Section 7.2.

[6804] Users pass KeyRef objects in as input parameters to publicfunctions of the QA Chip Logical Interface, and these KeyRefs aresubsequently passed to the signature function (called within theinterface function). Note, however, that the method functions for KeyRefobjects are not available outside the QA Chip Logical Interface.

[6805] 10.1.2 Object Variables

[6806] Table 252 describes each of the variables within a KeyRef object.TABLE 252 Description of object variables for KeyRef object ParameterDescription keyNum Slot number of the key to use as the basis for keyformation useChipId 0 = the key to be formed is a common key (i.e. isthe same as K_(keyNum)) 1 = the key to be formed is a variant key basedon K_(keyNum) ChipId When useChipId = 1, this is the ChipId to be usedto form the variant key (this will be the ChipId of the QA Device whichstores the variant of K_(keyNum)) When useChipId = 0, chipId is not used

[6807] 10.1.3 Object Methods

[6808] 10.1.3.1 getKey

[6809] public key getKey(void)

[6810] 10.1.3.1.1 Method Description

[6811] This method is a public method (public in object oriented terms,not public to users of the QA Chip Logical Interface) and is called bythe GenerateSignature function to return the key for use in signaturegeneration.

[6812] If useChipId is true, the formKeyVariant method is called to formthe key using chipid and then return the variant key. If useChipId isfalse, the key stored in slot keyNum is returned.

[6813] 10.1.3.1.2 Method Sequence

[6814] The getKey method is illustrated by the following pseudocode: If(useChipId = 0) key

K_(keyNum) Else key

formKeyVariant( ) EndIf Return key

[6815] 10.1.3.2 formKeyVariant

[6816] private key formKeyVariant (voID)

[6817] 10.1.3.2.1 Method Description

[6818] This method produces the variant form of a key, based on theK_(keyNum) and chipId. As described in Section 7.2, the variant form ofkey K_(keyNum) is generated by owf (K_(keyNum), chipID) where owf is aone-way function.

[6819] In addition, the time taken by owf must not depend on the valueof the key i.e. the timing should be effectively constant. This preventstiming attacks on the key.

[6820] At present, owf is SHA1, although this still needs to beverified. Thus the variant key is defined to be SHA1(K_(keyNum)|chipID).

[6821] 10.1.3.2.2 Method Sequence

[6822] The formKeyVariant method is illustrated by the followingpseudocode: key

SHA1( K_(keyNum) | chipId) # Calculation must take constant time Returnkey

[6823] 11 Functions

[6824] Digital signatures form the basis of all authentication protocolswithin the QA Chip Logical Interface. The signature functions are notdirectly available to users of the QA Chip Logical Interface, since agolden rule of digital signatures is never to sign anything exactly asit has been given to you.

[6825] Instead, these signature functions are internally available tothe functions that comprise the public interface, and are used by thosefunctions for the formation of keys and the generation of signatures.

[6826] 11.1 GenerateSignature

[6827] Input: KeyRef, Data, Random1, Random2

[6828] Output: SIG

[6829] Changes: None

[6830] Availability: All devices

[6831] 11.1.1 Function Description

[6832] This function uses KeyRef to obtain the actual key required forsignature generation, appends Random1 and Random2 to Data, and performsHMAC_SHA1[key, Data] to output a signature. HMAC_SHA1 is described in[1]. In addition, this operation must take constant time irrespective ofthe value of the key (see Section 10.1.3.2 for more details).

[6833] 11.1.2 Input Parameter Description

[6834] Table 253 describes each of the input parameters: TABLE 253Description of input parameters for GenerateSignature ParameterDescription KeyRef This is an instance of the KeyRef object for use bythe GenerateSignature function. For common key signature generation:KeyRef.keyNum = Slot number of the key to be used to produce thesignature. KeyRef.useChipId = 0 For variant key signature generation:KeyRef.keyNum = Slot number of the key to be used for generating thevariant key, where the var iant key is to be used to produce thesignature KeyRef.useChipId = 1 KeyRef.chipId = ChipId of the QA Devicewhich stores the variant of K_(KeyRef.keyNum), and uses the variant keyfor signature generation Data Preformatted data to be signed. Random1and Random2 are appended to Data before the signature is generated toensure that the signature is session based (applicable only to a singlesession). Random1 This is the session component from the QA Device thatis responding to the challenge. Random2 This is the session componentfrom the QA Device that issued the challenge.

[6835] 11.1.3 Output Parameter Description

[6836] Table 254 describes each of the output parameters. TABLE 254Description of output parameters for GenerateSignature ParameterDescription SIG SIG = SIG_(key)(Data | Random1 | Random2) where key =KeyRef.getKey( )

[6837] 11.1.4 Function Sequence

[6838] The GenerateSignature function is illustrated by the followingpseudocode: key

KeyRef.getKey( ) dataToBeSigned

Data|Random1|Random2 SIG

HMAC_SHA1(key, dataToBeSigned) # Calculation must take constant timeOutput SIG Return

[6839] Basic Functions

[6840] 12 Definitions

[6841] This section defines return codes and constants referred to byfunctions and pseudocode.

[6842] 12.1 ResultFlag

[6843] The ResultFlag is a byte that indicates the return status from afunction. Callers can use the value of ResultFlag to determine whether acall to a function succeeded or failed, and if the call failed, thespecific error condition.

[6844] Table 255 describes the ResultFlag values and the mnemonics usedin the pseudocode. TABLE 255 ResultFlag value description MnemonicDescription Possible causes Pass Function Function successfullycompleted completed requested task. sucessfully Fail General An erroroccurred during Failure function processing. BadSig Signature Inputsignature didn't match mismatch the generated signature. InvalidKeyKeyRef Input KeyRef.keyNum > 3. incorrect InvalidVector VectNum InputM_(VectNum) > 3. incorrect InvalidPermission Permission Trying toperform a Write or not adqeuate WriteAuth with incorrect to per formpermissions. operation. KeyAlreadyLocked Key already Key cannot bechanged because locked. it has already been locked.

[6845] 12.2 Constants

[6846] Table 256 describes the constants referred to by functions andpseudocode. TABLE 256 Constants Definition Value MaxKey NumKeys − 1(typically 7) MaxM NumVectors − 1 (typically 3) MaxWordInM 16 − 1 = 15

[6847] 13 GetInfo

[6848] Input: None

[6849] Output: ResultFlag, SoftwareReleaseIdMajor,SoftwareReleaseIdMinor, NumVec tors, NumKeys,ChipId DepthOfRollBackCache(for an upgrade device only)

[6850] Changes: None

[6851] Availability: All devices

[6852] 13.1 Function Description

[6853] Users of QA Devices must call the GetInfo function on each QADevice before calling any other functions on that device.

[6854] The GetInfo function tells the caller what kind of QA Device thisis, what functions are available and what properties this QA Device has.The caller can use this information to correctly call functions withappropriately formatted parameters.

[6855] The first value returned, SoftwareReleaseIdMajor, effectivelyidentifies what kind of QA Device this is, and therefore what functionsare available to callers. SoftwareReleaseIdMinor tells the caller whichversion of the specific type of QA Device this is. The mapping betweenthe SoftwareReleaseIdMajor and type of device and their differentfunctions is described in Table 258 Every QA Device also returnsNumVectors, NumKeys and ChipId which are required to set input parametervalues for commands to the device.

[6856] Additional information may be returned depending on the type ofQA Device. The VarDataLen and VarData fields of the output hold thisadditional information.

[6857] 13.2 Output parameters

[6858] Table 257 describes each of the output parameters. TABLE 257Description of output parameters for GetInfo function # Parameter bytesDescription ResultFlag Indicates whether the function completedsuccessfully or not. If it did not complete successfully, the reason forthe failure is returned here. See Section 12.1. SoftwareReleaseIdMajor 1This defines the function set that is available on this QA Device.SoftwareReleaseIdMinor 1 This defines minor software releases within amajor release, and are incremental changes to the software mainly todeal with bug fixes. NumVectors 1 Total number of memory vectors in thisQA Device. NumKeys 1 Total number of keys in this QA Device. ChipId 6This QA Device's ChipId VarDataLen 1 Length of bytes to follow. VarData(VarDataLen This is additional bytes) application specific data, andwill be of length VarDataLen (i.e. may be 0).

[6859] Table 258 shows the mapping between the SoftwareReleaseIdMajor,the type of QA Device and the available device functions. TABLE 258Mapping between SoftwareReleaseIdMajor and available device functionsSoftware ReleaseId Functions Major Device description available 1 Ink orPrinter QA GetInfo Device Random Read Test Translate WriteM1+WriteFields WriteFieldsAuth SetPerm ReplaceKey 2 Value Upgrader QA Allfunctions in Device (e.g. Ink the Ink or Printer Refill QA Device)Device, plus: StartXfer XferAmount StartRollBack RollBackAmount 3Parameter Upgrader All functions in QA Device the Ink or Printer device,plus: StartXfer XferField StartRollBack RollBackField 4 Key ReplacementAll functions in device the Ink or Printer Device, plus: GetProgramKeyReplaceKey - is different from the Ink or Printer device 5 Trusteddevice All functions in the Ink or Printer Device, plus: SignM

[6860] Table 259 shows the VarData components for Value Upgrader andParameter Upgrader QA Devices. TABLE 259 VarData for Value and ParameterUpgrader QA Devices VarData Length in Components bytes DescriptionDepthOfRollBackCache 1 The number of datasets that can be accommodatedin the Xfer Entry cache of the device.

[6861] 13.3 Function Sequence

[6862] The GetInfo command is illustrated by the following pseudocode:

[6863] Output SoftwareReleaseIdMajor

[6864] Output SoftwareReleaseIdMinor

[6865] Output NumVectors

[6866] Output NumKeys

[6867] Output ChipId

[6868] VarDataLen ← 1 # In case of an upgrade device

[6869] Output DepthOfRollBackCache

[6870] Return

[6871] 14 Random

[6872] Input: None

[6873] Output: R_(L)

[6874] Changes: None

[6875] Availability: All devices

[6876] The Random command is used by the caller to obtain a sessioncomponent (challenge) for use in subsequent signature generation.

[6877] If a caller calls the Random function multiple times, the sameoutput will be returned each time. R_(L) (i.e. this QA Device's R) willonly advance to the next random number in the sequence after asuccessful test of a signature or after producing a new signature. Thesame R_(L) can never be used to produce two signatures from the same QADevice.

[6878] The Random command is illustrated by the following pseudocode:

[6879] Output R_(L)

[6880] Return

[6881] 15 Read

[6882] Input: KeyRef, SigOnly, MSelect, KeyIdSelect, WordSelect, R_(E)

[6883] Output: ResultFlag, SelectedWordsOfSelectedMs, SelectedKeyIds,R_(L), SIG_(out)

[6884] Changes: R_(L)

[6885] Availability: All devices

[6886] 15.1 Function Description

[6887] The Read command is used to read data and keyIds from a QADevice. The caller can specify which words from M and which KeyIds areread.

[6888] The Read command can return both data and signature, or just thesignature of the requested data.

[6889] Since the return of data is based on the caller's input request,it prevents unnecessary information from being sent back to the caller.Callers typically request only the signature in order to confirm thatlocally cached values match the values on the QA Device.

[6890] The data read from an untrusted QA Device (A) using a Readcommand is validated by a trusted QA Device (B) using the Test command.The R_(L) and SIG_(out) produced as output from the Read command areinput (along with correctly formatted data) to the Test command on atrusted QA Device for validation of the signature and hence the data.SIG_(out) can also optionally be passed through the Translate command ona number of QA Devices between Read and Test if the QA Devices A and Bdo not share keys.

[6891] 15.2 Input Parameters

[6892] Table 260 describes each of the input parameters: TABLE 260Description of input parameters for Read Parameter Description KeyRefFor common key signature generation: KeyRef.keyNum = Slot number of thekey to be used for producing the output signature. KeyRef.useChipId = 0No variant key signature generation required SigOnly Flag indicatingreturn of signature and data. 0- indicates both the signature and dataare to be returned. 1- indicates only the signature is to be returned.Mselect Selection of memory vectors to be read - each bit correspondingto a given memory vec tor (a maximum of NumVector bits) 0- indicates thememory vector must not be read. 1- indicates memory vector must be read.KeyIdSelect Selection of KeyIds to be read - each bit corresponds to agiven KeyId (a maximum of NumKey bits). 0- indicates KeyId must not beread. 1- indicates KeyId must be read. WordSelect Selection of wordsread from a desired M as requested in MSelect. Each WordSelect is 16bits corresponding to each bit in MSelect. Each bit in the WordSelectindicates whether or not to read the corresponding word for theparticular M. 0- indicates word must not be read. 1- indicates word mustbe read. R_(E) External random value required for output signaturegeneration (i.e the challenge). R_(E) is obtained by calling the Randomfunction on the device which will receive the SIG_(out) from the Readfunction.

[6893] 15.3 Output Parameters

[6894] Table 261 describes each of the output parameters. ParameterDescription ResultFlag Indicates whether the function completedsuccessfully or not. If it did not complete successfully, the reason forthe failure is returned here. See Section 12.1.SelectedWordsOfSelectedMs Selected words from selected memory vectors asrequested by MSelect and WordSelect. SelectedKeyIds Selected KeyIds asrequested by KeyIdSelect. R_(L) Local random value added to the outputsignature (i.e SIG_(out)). Refer to FIG. 370. SIG_(out) SIG_(out) =SIG_(KeyRef)(data | R_(L) | R_(E)) as shown in FIG. 8. Refer to Section10.1.3.1 for details.

[6895] 15.3.1 SIG_(out)

[6896]FIG. 370 shows the formatting of data for output signaturegeneration.

[6897] Table 262 gives the parameters included in SIG_(out) Value setValue set Parameter Length in bits internally from Input RWSense 3 readconstant = 000 Refer to Section 15.3.1.1 MSelect 4  KeyIdSelect 8 ChipId 48 This QA Device's ChipId WordSelect 16 per M  Selected 32 perword The appropriate  WordsOf words from the SelectedMs various Ms asselected by the caller R_(L) 160 This QA Device's current R R_(E) 160 

[6898] 15.3.1.1 RWSense

[6899] An RWSense value is present in the signed data to distinguishwhether a signature was produced from a Read or produced for aWriteAuth.

[6900] The RWSense is set to a read constant (000) for producing asignature from a read function. The RWSense is set to a write constant(001) for producing a signature for a write function.

[6901] The RWSense prevents signatures produced by Read to besubsequently sent into a WriteAuth function. Only signatures producedwith RWSense set to write (001), are accepted by a write function.

[6902] 15.4 Function sequence

[6903] The Read command is illustrated by the following pseudocode:Accept input parameters- KeyRef, SigOnly, MSelect, KeyIdSelect # Acceptinput parameter WordSelect based on MSelect For i

0 to MaxM If(MSelect[i] = 1) Accept next WordSelect WordSelectTemp[i]

WordSelect EndIf EndFor Accept R_(E) Check range of KeyRef.keyNum Ifinvalid ResultFlag

InvalidKey Output ResultFlag Return EndIf #BuildSelectedWordsOfSelectedMs k

0 # k stores the word count for SelectedWordsOfSelectedMsSelectedWordsOfSelectedMs[k]

0 For i

0 to 3 If(MSelect[i] = 1) For j

0 to MaxWordInM If(WordSelectTemp[i][j] = 1) SelectedWordsOfSelectedMs[k]

(M_(i)[j])  k++ EndIf EndFor EndIf EndFor #Build SelectedKeyIds l

0 # l stores the word count for SelectedKeyIds SelectedKeyIds[l]

0 For i

0 to MaxKey If(KeyIdSelect[i] = 1) SelectedKeyIds[l]

KeyId[i] l++ EndIf EndFor #Generate message for passing into theGenerateSignature function data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect |SelectWordsOfSelectedMs|SelectedKeyIds) # Refer to Figure 370.#Generate Signature function SIG_(L)

GenerateSignature(KeyRef,data,R_(L),R_(E)) # See Section 11.1 UpdateR_(L) to R_(L2) ResultFlag

Pass Output ResultFlag If(SigOnly = 0) Output SelectedWordsOfSelectedMs,SelectedKeyIds EndIf Output R_(L), SIG_(L) Return

[6904] 16 Test

[6905] Input: KeyRef, DataLength, Data, R_(E), SIG_(E)

[6906] Output: ResultFlag

[6907] Changes: R_(L)

[6908] Availability: All devices except ink device

[6909] 16.1 Function Description

[6910] The Test command is used to validate data that has been read froman untrusted QA Device according to a digital signature SIG_(E). Thedata will typically be memory vector and KeyId data. SIG_(E) (and itsrelated R_(E)) is the most recent signature—this will be the signatureproduced by Read if Translate was not used, or will be the output fromthe most recent Translate if Translate was used. The Test functionproduces a local signature (SIG_(L)=SIG_(key)(Data|R_(E)|R_(L)) andcompares it to the input signature (SIG_(E)). If the two signaturesmatch the function returns ‘Pass’, and the caller knows that the dataread can be trusted.

[6911] The key used to produce SIG_(L) depends on whether SIG_(E) wasproduced by a QA Device sharing a common key or a variant key. TheKeyRef object passed into the interface must be set appropriately toreflect this.

[6912] The Test function accepts preformatted data (as DataLength numberof words), and appends the external R_(E) and local R_(L) to thepreformatted data to generate the signature as shown in FIG. 371.

[6913] 16.2 Input Parameters

[6914] Table 263 describes each of the input parameters. TABLE 263Description of input parameters for Test Parameter Description KeyRefFor testing common key signature: KeyRef.keyNum = Slot number of the keyto be used for testing the signature. SIG_(E) produced usingK_(KeyRef.keyNum) by the external device. KeyRef.useChipId = 0 Fortesting variant key signature: KeyRef.keyNum = Slot number of the key tobe used for generating the variant key. SIG_(E) produced using a variantof K_(KeyRef.keyNum) by the external device. KeyRef.useChipId = 1KeyRef.chipId = ChipId of the device which generated SIG_(E) using avariant of K_(KeyRef.keyNum). DataLength Length of preformatted data inwords. Must be non zero. Data Preformatted data to be used for producingthe signature. R_(E) External random value required for verifying theinput signature. This will be the R from the input signature generator(i.e the device generating SIG_(E)). SIG_(E) External signature requiredfor authenticating input data as shown in FIG. 371. The externalsignature is generated either by a Read function or a Translatefunction. A correct SIG_(E) = SIG_(KeyRef)(Data | R_(E) | R_(L)).

[6915] 16.2.1 Input Signature Verification Data Format

[6916]FIG. 371 shows the formatting of data for input signatureverification.

[6917] The data in FIG. 371 (i.e. not R_(E) or R_(L)) is typicallyoutput from a Read function (formatted as per FIG. 370). The data mayalso be generated in the same format by the system from its cache aswill be the case when it performs a Read using SigOnly=1.

[6918] 16.3 Output Parameters

[6919] Table 264 describes each of the output parameters. TABLE 264Description of output parameters for Test Parameter DescriptionResultFlag Indicates whether the function completed successfully or not.If it did not complete successfully, the reason for the failure isreturned here. See Section 12.1.

[6920] 16.4 Function Sequence

[6921] The Test command is illustrated by the following pseudocode:

[6922] Accept input parameters—KeyRef, DataLength Accept inputparameters-KeyRef, DataLength # Accept input parameter- Data based onDataLength For i

0 to (DataLength − 1) Accept next word of Data EndFor Accept inputparameters - R_(E), SIG_(E) Check range of KeyRef.keyNum If invalidResultFlag

InvalidKey Output ResultFlag Return EndIf #Generate signature SIG_(L)

GenerateSignature(KeyRef,Data,R_(E),R_(L)) # Refer to Figure 371. #Checksignature If(SIG_(L) = SIG_(E)) Update R_(L) to R_(L2) ResultFlag

Pass Else ResultFlag

BadSig EndIf Output ResultFlag Return

[6923] 17 Translate

[6924] Input: InputKeyRef, DataLength, Data, R_(E), SIG_(E),OutputKeyRef, R_(E2)

[6925] Output: ResultFlag, R_(L2), SIG_(out)

[6926] Changes: R_(L)

[6927] Availability: Printer device, and possibly on other devices

[6928] 17.1 Function Description

[6929] It is possible for a system to call the Read function on QADevice A to obtain data and signature, and then call the Test functionon QA Device B to validate the data and signature. In the same way it ispossible for a system to call the SignM function on a trusted QA DeviceB and then call the WriteAuth function on QA Device B to actually storedata on B. Both of these actions are only possible when QA Devices A andB share secret key information.

[6930] If however, A and B do not share secret keys, we can create avalidation chain (and hence extension of trust) by means of translationof signatures. A given QA Device can only translate signatures if itknows the key of the previous stage in the chain as well as the key ofthe next stage in the chain. The Translate function provides thisfunctionality.

[6931] The Translate function translates a signature from one based onone key to one based another key. The Translate function first performsa test of the input signature using the InputKeyRef, and if the testsucceeds produces an output signature using the OutputKeyRef. TheTranslate function can therefore in some ways be considered to be acombination of the Test and Read function, except that the data is inputinto the QA Device instead of being read from it.

[6932] The InputKeyRef object passed into Translate must be setappropriately to reflect whether SIG_(E) was produced by a QA Devicesharing a common key or a variant key.

[6933] The key used to produce output signature SIG_(out) depends onwhether the translating device shares a common key or a variant key withthe QA Device receiving the signature. The OutputKeyRef object passedinto Translate must be set appropriately to reflect this.

[6934] Since the Translate function does not interpret or generate thedata in any way, only preformatted data can be passed in. The Translatefunction does however append the external R_(E) and local R_(L) to thepreformatted data for verifying the input signature, then advances R_(L)to R_(L2), and appends R_(L2) and R_(E2) to the preformatted data toproduce the output signature. This is done to protect the keys andprevent replay attacks.

[6935] The Translate Functions Translates:

[6936] signatures for subsequent use in Test, typically originating fromRead

[6937] signatures for subsequent use in WriteAuth, typically originatingfrom SignM

[6938] In both cases, preformatted data is passed into the Translatefunction by the system. For translation of data destined for Test, thedata should be preformatted as per FIG. 370 (all words except the Rs).For translation of signatures for use in WriteAuth, the data should bepreformatted as per FIG. 373 (all words except the Rs).

[6939] 17.2 Input Parameters

[6940] Table 265 describes each of the input parameters. TABLE 265Description or input parameters for Translate Parameter DescriptionInputKeyRef For translating common key input signature:InputKeyRef.keyNum = Slot number of the key to be used for testing thesignature. SIG_(E) produced using K_(InputKeyRef.keyNum) by the externaldevice. InputKeyRef.useChipId = 0 For translating variant key inputsignatures: InputKeyRef.keyNum = Slot number of the key to be used forgenerating the variant key. SIG_(E) produced using a variant ofK_(InputKeyRef.keyNum) by the external device. InputKeyRef.useChipId = 1InputKeyRef.chipId = ChipId of the device which generated SIG_(E) usinga variant of K_(InputKeyRef.keyNum). DataLength: Length of data inwords. Data Data used for testing the input signature and for producingthe output signature. R_(E) External random value required for verifyinginput signature. This will be the R from the input signature generator(i.e device generating SIG_(E)). SIG_(E) External signature required forauthenticating input data. The external signature is either generated bya Read function, a Xfer/Rollback function or a Translate function. Acorrect SIG_(E) = SIG_(KeyRef)(Data | R_(E) | R_(L)). OutputKeyRef Forgenerating common key output signature: OutputKeyRef.keyNum = Slotnumber of the key for producing the output signature. SIGout producedusing K_(OutputKeyRef.keyNum) because the device receiving SIGout sharesK_(OutputKeyRef.keyNum) with the translating device.OutputKeyRef.useChipId = 0 For generating variant key output signature:OutputKeyRef.keyNum = Slot number of the key to be used for generatingthe variant key. SIGout produced using a variant ofK_(OutputKeyRef.keyNum) because the device receiving SIGout shares avariant of K_(OutputKeyRef.keyNum) with the translating device.OutputKeyRef.useChipId = 1 OutputKeyRef.chipId = ChipId of the devicewhich receives SIG_(out) produced by a variant ofK_(OutputKeyRef.keyNum). R_(E2) External random value required foroutput signature generation. This will be the R from the destination ofSIG_(out). R_(E2) is obtained by calling the Random function on thedevice which will receive the SIG_(out) from the Translate function.

[6941] 17.2.1 Input Signature Verification Data Format

[6942] This is the same format as used in the Test function. Refer toSection 16.2.1.

[6943] 17.3 Output parameters

[6944] Table 266 describes each of the output parameters. TABLE 266Description of output parameters for Translate Parameter DescriptionResultFlag Indicates whether the function completed successfully or not.If it did not complete successfully, the reason for the failure isreturned here. See Section 12.1. R_(L2) Local random value used inoutput signature (i.e SIG_(Out)). SIG_(Out) Output signature producedusing OutputKeyRef.keyNum using the data format described in FIG. 372.SIG_(Out) = SIG_(OutKeyRef)(Data | R_(L2) | R_(E2)). Refer to Section10.1.3.1 for details.

[6945] 17.3.1 SIG_(out)

[6946]FIG. 372 shows the data format for output signature generationfrom the Translate function.

[6947] 17.4 Function Sequence

[6948] The Translate command is illustrated by the following pseudocode:Accept input parameters-InputKeyRef, DataLength # Accept inputparameter- Data based on DataLength For i

0 to (DataLength − 1) Accept next Data EndFor Accept inputparameters-R_(E), SIG_(E),OutputKeyRef, R_(E2) Check range ofInputKeyRef,keyNum and OutputKeyRef.keyNum If invalid ResultFlag

Invalidkey Output ResultFlag Return EndIf #Generate Signature SIG_(L)

GenerateSignature(InputKeyRef,Data,R_(E),R_(l)) # Refer to FIG. 371.#Validate input signature If(SIG_(L) = SIG_(E)) Update R_(L) to R_(L2)Else ResultFlag

BadSig Output ResultFlag Return EndIf #Generate output signatureSIG_(Out)

GenerateSignature(OutputKeyRef,Data,R_(E),R_(L)) # Refer to FIG. 372.Update R_(L2) to R_(L3) ResultFlag

Pass Output ResultFlag, R_(L2), SIG_(Out) Return

[6949] 18 WriteM1+

[6950] Input: VectNum, WordSelect, MVal

[6951] Output ResultFlag

[6952] Changes: M_(VectNum)

[6953] Availability: All devices

[6954] 18.1 Function description

[6955] The WriteM1+ function is used to update selected words of M1+,subject to the permissions corresponding to those words stored inP_(VectNum).

[6956] Note: Unlike WriteAuth, a signature is not required as an inputto this function.

[6957] 18.2 Input Parameters

[6958] Table 267 describes each of the input parameters. TABLE 267Description of input parameters for WriteM1+ Parameter DescriptionVectNum Number of the memory vector to be written. Must be in range 1 to(NumVectors −1) WordSelect Selection of words to be written. 0-indicates corresponding word is not written. 1- indicates correspondingword is to be written as per input. If WordSelect[N bit] is set, thenwrite to M_(VectNum) word N. MVal Multiple of words corresponding to thenumber of words selected for write. Starts with LSW of M_(VectNum).

[6959] Note: Since this function has no accompanying signatures,additional input parameter error checking is required.

[6960] 18.3 Output Parameters

[6961] Table 268 describes each of the output parameters. TABLE 268Description of output parameters for WriteM1+ Parameter DescriptionResultFlag Indicates whether the function completed successfully or not.If it did not complete successfully, the reason for the failure isreturned here. See Section 12.1.

[6962] 18.4 Function Sequence

[6963] The WriteM1+ command is illustrated by the following pseudocode:Accept input parameters VectNum, WordSelect #Accept MVal as perWordSelect MValTemp[16]

0 # Temporary buffer to hold MVal after being read For i

0 to MaxWordInM # word 0 to word 15 If(WordSelect[i] = 1) Accept nextMVal MValTemp [i]

MVal # Store MVal in temporary buffer EndIf EndFor Check range ofVectNum If invalid ResultFlag

InvalidVector Output ResultFlag Return EndIf #Checking non authenticatedwrite permission for M1+ PermOK

CheckM1+Perm(VectNum,WordSelect) Writing M with MVal If(PermOK =1)WriteM(VectNum,MValTemp[ ]) ResultFlag

Pass Else ResultFlag

InvalidPermission EndIf Output ResultFlag Return

[6964] 18.4.1 PermOK CheckM1+Perm (VectNum, WordSelect)

[6965] This function checks WordSelect against permission P_(VectNum)for the selected word For i

0 to MaxWordInM # word 0 to word 15 If(WordSelect[i] = 1)

(P_(VectNum)[i] = 0) # Trying to write a ReadOnly word Return PermOK

0 EndIf EndFor Return PermOK

1

[6966] 18.4.2 WriteM(VectNum, MvalTemp

)

[6967] This function copies MvalTemp to M_(VectNum). For i

0 to MaxWordInM # Copying word from temp buff to M If(VectNum = 1) # IfM1 P_(VectNum)[i]

0 # Set permission to ReadOnly before writing EndIf M_(VectNum)[i]

 MValTemp[i] # copy word buffer to M word EndIf EndFor

[6968] 19 WriteFields

[6969] Input FieldSelect, FieldVal

[6970] Output: ResultFlag

[6971] Changes: M_(VectNum)

[6972] Availability: All devices

[6973] 19.1 Function Description

[6974] The WriteFields function is used to write new data to selectedfields (stored in M0). The write is carried out subject to thenon-authenticated write access permissions of the fields as stored inthe appropriate words of M1 (see Section 8.1.1.3).

[6975] The WriteFields function is used whenever authorization for awrite (i.e. a valid signature) is not required. The WriteFieldsAuthfunction is used to perform authenticated writes to fields. For example,decrementing the amount of ink in an ink cartridge field is permitted byanyone via the WriteFields, but incrementing it during a refilloperation is only permitted using WriteFieldsAuth. Therefore WriteFieldsdoes not require a signature as one of its inputs.

[6976] 19.2 Input Parameters

[6977] Table 269 describes each of the input parameters. TABLE 269Description of input parameters for WriteFields Parameter DescriptionFieldSelect Selection of fields to be written. 0- indicatescorresponding field is not written. 1- indicates corresponding field isto be written as per input. If FieldSelect [N bit] is set, then write toField N of M0. FieldVal Multiple of words corresponding to the words forall selected fields. Since Field0 starts at M0[15], FieldVal wordsstarts with MSW of lower field.

[6978] 19.3 Output Parameters

[6979] Table 270 describes each of the output parameters. TABLE 270Description of output parameters for WriteFields Parameter DescriptionResultFlag Indicates whether the function completed successfully or not.If it did not complete successfully, the reason for the failure isreturned here. See Section 12.1.

[6980] 19.4 Function Sequence

[6981] The WriteFields command is illustrated by the followingpseudocode: Accept input parameters FieldSelect #Accept FieldVal as perFieldSelect into a temporary buffer MValTemp #Find the size of eachFieldNum to accept FieldData FieldSize[16]

0 # Array to hold FieldSize assuming there are 16 fields NumFields

FindNumberOfFieldsInM0 (M1,FieldSize) MValTemp[16]

0 # Temporary buffer to hold FieldVal after being read For i

0 to NumFields If FieldSelect[i] = 1 If i = 0 # Check if field number is0 PreviousFieldEndPos

MaxWordInM Else PreviousFieldEndPos

M1[i−1].EndPos # position of the last word for the # previous fieldEndIf For j

(PreviousFieldEndPos −1) to M1[FieldNum].EndPos ( ) MValTemp[j] = NextFieldVal word #Store FieldVal in MValTemp. EndFor EndIf EndFor #Checknon-authenticated write permissions for all fields in FieldSelect PermOK

CheckM0NonAuthPerm(FieldSelect,MValTemp,M0,M1) #Writing M0 with MValTempif permissions allow writing If(PermOK =1) WriteM(0,MValTemp) ResultFlag

Pass Else ResultFlag

InvalidPermission EndIf Output ResultFlag Return

[6982] 19.4.1 NumFields FindNumOfFieldsInM0(M1,FieldSize

)

[6983] This function returns the number of fields in M0 and an arrayFieldSize which stores the size of each field. CurrPos

0 NumFields

0 FieldSize[16]

0 # Array storing field sizes For FieldNum

0 to MaxWordInM If(CurrPos = 0) # check if last field has reached ReturnFieldNum #FieldNum indicates number of fields in M0 EndIfFieldSizeFieldNum]

CurrPos − M1[FieldNum].EndPos If(FieldSize[FieldNum] < 0) Error #Integrity problem with field attributes Return FieldNum # Lower M0fields are still valid but higher M0 fields are # ignored Else CurrPos

M1 [FieldNum].EndPos EndIf EndFor

[6984] 19.4.2 Word BitMapForField GetWordMapForField(FieldNum,M1)

[6985] This function returns the word bitmap corresponding to a fieldi.e the field consists of which consecutive words. WordBitMapForField

0 WordMapTemp

0 PreviousFieldEndPos

M1[FieldNum −1].EndPos # position of the last word for the # previousfield For j

(PreviousFieldEndPos +1) to M1[FieldNum].EndPos( ) # Set bitcorresponding to the word position WordMapTemp

SHIFTLEFT (1,j) WordBitMapForField

WordMapTemp

WordBitMapForField EndFor Return WordBitMapForField

[6986] 19.4.3 PermOK CheckM0NonAuthPerm(FieldSelect,MValTemp

,M0,M1)

[6987] This functions checks non-authenticated write permissions for allfields in FieldSelect. PermOK CheckM0NonAuthPerm( ) FieldSize[16]

0 NumFields

FindNumOfFieldsInM0(FieldSize) # Loop through all fields in FieldSelectand check their # non-authenticated permission For i

0 to NumFields If FieldSelect[i] = 1 # check selected WordBitMapForField

GetWordMapForField(i,M1) #get word bitmap for field PermOK

CheckFieldNonAuthPerm(i,WordBitMapForField,MValTemp,M0,) # Checkpermission for field i in FieldSelect If(PermOK = 0) #Writing is notallowed, return if permissions for field # doesn't allow writing ReturnPermOK EndIf EndIf EndFor Return PermOK

[6988] 19.4.4 PermOK

[6989] CheckFieldNonAuthPerm(FieldNum,WordBitMapForField, MValTemp

, M0)

[6990] This function checks non authenticated write permissions for thefield. DecrementOnly

0 AuthRW

M1[FieldNum].AuthRW NonAuthRW

M1[FieldNum].AuthRW If(NonAuthRW = 0) # No NonAuth write allowed ReturnPermOK

0 EndIf If((AuthRW = 0)

(NonAuthRW = 1))# NonAuthRW allowed Return PermOK

1 ElseIf(AuthRW = 1)

(NonAuthRW = 1)# NonAuth DecrementOnly allowed PermOK

CheckInputDataForDecrementOnly(M0,MValTemp,WordBitMapForField) ReturnPermOK EndIf

[6991] 19.4.5 PermOK CheckInputDataForDecrementOnly(M0,MValTemp

,WordBitMapForField)

[6992] This function checks the data to be written to the field is lessthan the current value. DecEncountered

0 LessThanFlag

0 EqualToFlag

0 For i = MaxWordInM to 0 If(WordBitMapForField[i] = 1) # starting wordof the field - starting at MSW # comparing the word of temp buffer withM0 current value LessThanFlag

M0[i] < MValTemp[i] EqualToFlag

M0[i] = MValTemp[i] # current value is less or previous value has beendecremented If(LessThanFlag =1)

(DecEncountered = 1) DecEncountered

1 PermOK

1 Return PermOK ElseIf(EqualToFlag≠1) # Only if the value is greaterthan current and decrement not encountered in previous words PermOK

0 Return PermOK EndIf EndIf EndFor

[6993] 19.4.6 WriteM(VectNum, MValTemp

)

[6994] Refer to Section 18.4.2 for details.

[6995] 20 WriteFieldsAuth

[6996] Input: KeyRef, FieldSelect, FieldVal, R_(E), SIG_(E)

[6997] Output: ResultFlag

[6998] Changes: _(M0) and R_(L)

[6999] Availability: All devices

[7000] 20.1 Function Description

[7001] The WriteFieldsAuth command is used to securely update a numberof fields (in _(M0)). The write is carried out subject to theauthenticated write access permissions of the fields as stored in theappropriate words of M1 (see Section 8.1.1.3). WriteFieldsAuth willeither update all of the requested fields or none of them; the writeonly succeeds when all of the requested fields can be written to.

[7002] The WriteFieldsAuth function requires the data to be accompaniedby an appropriate signature based on a key that has appropriate writepermissions to the field, and the signature must also include the localR (i.e. nonce/challenge) as previously read from this QA Device via theRandom function.

[7003] The appropriate signature can only be produced by knowingK_(KeyRef). This can be achieved by a call to an appropriate command ona QA Device that holds a key matching K_(KeyRef). Appropriate commandsinclude SignM, XferAmount, XferField, StartXfer, and StartRollBack.

[7004] 20.2 Input Parameters

[7005] Table 271 describes each of the input parameters for WriteAuth.Parameter Description KeyRef For common key signature generation:KeyRef.keyNum = Slot number of the key to be used for testing the inputsignature. KeyRef.useChipId = 0 No variant key signature generationrequired FieldSelect Selection of fields to be written. 0- indicatescorresponding field is not written. 1- indicates corresponding field isto be written as per input. If FieldSelect [N bit] is set, then write toField N of M0. FieldVal Multiple of words corresponding to the totalnumber of words for all selected fields. Since Field0 starts at M0[15],FieldVal words starts with MSW of lower field. RE External random valueused to verify input signature. This will be the R from the inputsignature generator (i.e device generating SIG_(E)). SIGE Externalsignature required for authenticating input data. The external signatureis either generated by a Translate or one of the Xfer functions. Acorrect SIG_(E) = SIG_(KeyRef)(data | R_(E) | R_(L)).

[7006] 20.2.1 Input Signature Verification Data Format

[7007]FIG. 373 shows the input signature verification data format forthe WriteAuth function.

[7008] Talbe 272 gives the parameters included in SIG_(E) for Write AuthLength Value set Value set Parameter in bits internally from InputRWSense 3 write constant = 001 Refer to Section 15.3.1.1 FieldNum 4 ChipID 48 This QA Device's ChipId FieldData 32  per word R_(E) 160 R_(L) 160 random value from device

[7009] 20.3 Output Parameters

[7010] Table 273 describes each of the output parameters. TABLE 273Description of output parameters for WriteAuth Parameter DescriptionResultFlag Indicates whether the function completed successfully or not.If it did not complete successfully, the reason for the failure isreturned here. See Section 12.1.

[7011] 20.4 Function Sequence

[7012] The WriteAuth command is illustrated by the following pseudocode:Accept input parameters-KeyRef, FieldSelect, #Accept FieldVal as perFieldSelect into a temporary buffer MValTemp #Find the size of eachFieldNum to accept FieldData FieldSize[16]

0 # Array to hold FieldSize assuming there are 16 fields NumFields

FindNumberOfFieldsInM0 (M1,FieldSize) MValTemp[16]

0 # Temporary buffer to hold FieldVal after being read For i

0 to NumFields If i = 0 # Check if field number is 0 PreviousFieldEndPos

MaxWordInM Else PreviousFieldEndPos

M1[i−1].EndPos # position of the last word for the previous field EndIfFor j

(PreviousFieldEndPos −1) to M1[FieldNum].EndPos ( ) MValTemp[j] = NextFieldVal word #Store FieldVal in MValTemp. EndFor EndIf EndFor AcceptR_(E), SIG_(E) Check range of KeyRef.keyNum If invalid range ResultFlag

InvalidKey Output ResultFlag Return EndIf #Generate message for passingto GenerateSignature function data

(RWSense|FieldSelect|ChipId|FieldVal #Generate Signature SIG_(L)

GenerateSignature(KeyRef,data,R_(E),R_(L)) # Refer to Figure 373. #Checksignature If(SIG_(L) = SIG_(E)) Update R_(L) to R_(L2) Else ResultFlag

BadSig Output ResultFlag Return EndIf #Check authenticated writepermission for all fields in FieldSelect using KeyRef PermOK

CheckM0AuthPerm(FieldSelect,MValTemp,M0,M1,KeyRef) If(PermOK = 1)WriteM(0,MValTemp[ ])# Copy temp buffer to M0 ResultFlag

Pass Else ResultFlag

InvalidPermission EndIf Output ResultFlag Return

[7013] 20.4.1 PermOK CheckM0AuthPerm(FieldSelect,MValTemp

,M0, M1, KeyRef)

[7014] This functions checks non-authenticated write permissions for allfields in FieldSelect using KeyRef. PermOK CheckM0NonAuthPerm( )FieldSize[16]

0 NumFields

FindNumOfFieldsInM0(FieldSize) # Loop through fields For i

0 to NumFields If FieldSelect[i] = 1 # check selected WordBitMapForField

GetWordMapForField(i,M1) #get word bitmap for field PermOK

CheckAuthFieldPerm(i,WordBitMapForField,MValTemp,M0, KeyRef)  # Checkpermission for field i in FieldSelect If(PermOK = 0) #Writing is notallowed, return if #permissions for field doesn't allow writing ReturnPermOK EndIf EndIf EndFor Return PermOK

[7015] 20.4.2 PermOK CheckAuthFieldPerm(FieldNum,WordMapForField,MValTemp

, M0,KeyRef)

[7016] This function checks authenticated permissions for an M₀ fieldusing KeyRef (whether KeyRef has write permissions to the field). AuthRW

 M1[FieldNum].AuthRW KeyNumAtt

 M1[FieldNum].KeyNum If (AuthRW = 0) # Check whether any key has writepermissions Return PermOK

0 # No authenticated write permissions EndIf # Check KeyRef hasReadWrite Permission to the field and it is locked If(KeyLock_(KeyNum) =locked) 

 (KeyNumAtt = KeyRef.keyNum) Return PermOK

1 Else # KeyNum is not a ReadWrite Key KeyPerms

 M1[FieldNum].DOForKeys # Isolate KeyPerms for FieldNum # CheckDecrement Only Permission for Key If(KeyPerms[KeyRef.keyNum] = 1) # Keyis allowed to Decrement field PermOK

 CheckInputDataForDecrementOnly(M0,MValTemp,WordMapForField) Else # Keyis a ReadOnly key PermOK

0 EndIf EndIf Return PermOK

[7017] 20.4.3 WordBitMapField GetWordMapForField(FieldNum,M1)

[7018] Refer to Section 19.4.2 for details.

[7019] 20.4.4 PermOK CheckInputDataForDecrementOnly(M0,MValTemp

,WordMapForField)

[7020] Refer to Section 19.4.5 for details.

[7021] 20.4.5 WriteM(VectNum, MValTemp

)

[7022] Refer to Section 18.4.2 for details.

[7023] 21 SetPerm

[7024] Input: VectNum, PermVal

[7025] Output: ResultFlag, NewPerm

[7026] Changes: P_(n)

[7027] Availability: All devices

[7028] 21.1 Function Description

[7029] The SetPerm command is used to update the contents of P_(VectNum)(which stores the permission for M_(VectNum)).

[7030] The new value for P_(VectNum) is a combination of the old and newpermissions in such a way that the more restrictive permission for eachpart of P_(VectNum) is kept.

[7031] M0's permissions are set by M1 therefore they can't be changed.

[7032] M1's permissions cannot be changed by SetPerm. M1 is a write-oncememory vector and its permissions are set by writing to it.

[7033] See Section 8.1.1.3 and Section 8.1.1.5 for more informationabout permissions.

[7034] 21.2 Input Parameters

[7035] Table 274 describes each of the input parameters for SetPerm.Parameter Description VectNum Number of the memory vector whosepermission is being changed. PermVal Bitmap of permission for thecorresponding Memory Vector.

[7036] Note: Since this function has no accompanying signatures,additional input parameter error checking is required.

[7037] 21.3 Output Parameters

[7038] Table 275 describes each of the output parameters for SetPerm.Parameter Description ResultFlag Indicates whether the functioncompleted successfully or not. If it did not complete successfully, thereason for the failure is returned here. See Section 12.1. Perm IfVectNum = 0, then no Perm is returned. If VectNum = 1, then old Perm isreturned. If VectNum > 1, then new Perm is returned after P_(VectNum)has been changed based on PermVal.

[7039] 21.4 Function Sequence

[7040] The SetPerm command is illustrated by the following pseudocode:Accept input parameters- VectNum, PermVal Check range of VectNum Ifinvalid ResultFlag

 InvalidVector Output ResultFlag Return End If If(VectNum = 0) # Nopermssions for M0 ResultFlag

 Pass Output ResultFlag Return ElseIf(VectNum = 1) ResultFlag

 Pass Output ResultFlag Output P₁ Return ElseIf(VectNum >1) # Check thatonly ‘RW’ parts are being changed # RW(1) → RO(0), RO(0) → RO(0), RW(1)→ RW(1) - valid change # RO(0) → RW(1) - Invalid change # checking forchange from ReadOnly to ReadWrite temp

˜P_(VectNum)

PermVal If(temp = 1 ) # If invalid change is 1 ResultFlag

InvalidPermission Output ResultFlag Else P_(VectNum)

 PermVal ResultFlag

Pass Output ResultFlag Output P_(VectNum) EndIf Return EndIf

[7041] 22 ReplaceKey

[7042] Input: KeyRef, KeyId, KeyLock, EncryptedKey,R_(E), SIG_(E)

[7043] Output: ResultFlag

[7044] Changes: K_(KeyRef.keyNum) and R_(L)

[7045] Availability: All devices

[7046] 22.1 Function Description

[7047] The ReplaceKey command is used to replace the contents of anon-locked keyslot, which means replacing the key, its associated keyID,and the lock status bit for the keyslot. A key can only be replaced ifthe slot has not been locked i.e. the KeyLock for the slot is 0. Theprocedure for replacing a key also requires knowledge of the value ofthe current key in the keyslot i.e. you can only replace a key if youknow the current key.

[7048] Whenever the ReplaceKey function is called, the caller has theability to make this new key the final key for the slot. This isaccomplished by passing in a new value for the KeyLock flag. A newKeyLock flag of 0 keeps the slot unlocked, and permits furtherreplacements. A new KeyLock flag of 1 means the slot is now locked, withthe new key as the final key for the slot i.e. no further keyreplacement is permitted for that slot.

[7049] 22.2 Input Parameters

[7050] Table 276 describes each of the input parameters for Replacekey.Parameter Description KeyRef For common key signature generation:KeyRef.keyNum = Slot number of the key to be used for testing the inputsignature, and will be replaced by the new key. KeyRef.useChipId = 0 Novariant key signature generation required KeyId KeyId of the new key.The LSB represents whether the new key is a variant or a common key.KeyLock Flag indicating whether the new key should be the final key forthe slot or not. (1 = final key, 0 = not final key) EncryptedKeySIG_(Kold)(R_(E)|R_(L)) ⊕ K_(new) where K_(old) = KeyRef.getkey( ).Refer to Section 10.1.3.1 RE External random value required forverifying input signature. This will be the R from the input signaturegenerator (device generating SIG_(E)). In this case the input signatureis a generated by calling the GetProgramKey function on a KeyProgramming device. SIGE External signature required for authenticatinginput data and determining the new key from the EncryptedKey.

[7051] 22.2.1 Input Signature Generation Data Format

[7052]FIG. 374 shows the input signature generation data format for theReplaceKey function.

[7053] Table 277 gives the parameters included in SIG_(E) forReplaceKey. Length Value set Value set Parameter in bits internally fromInput ChipId 48 This QA Device's ChipId KeyId 32 • R_(E) 160 •EncryptedKey 160 •

[7054] 22.3 Output Parameters

[7055] Table 278 describes each of the output parameters for ReplaceKey.Parameter Description ResultFlag Indicates whether the functioncompleted successfully or not. If it did not complete successfully, thereason for the failure is returned here. See Section 12.1.

[7056] 22.4 Function Sequence

[7057] The ReplaceKey command is illustrated by the followingpseudocode: Accept input parameters - KeyRef, KeyId, KeyLock,EncryptedKey,R_(E), SIG_(E) Check KeyRef.keyNum range If invalidResultFlag

InvalidKey Output ResultFlag Return EndIf #Generate message for passingto GenerateSignature function data

(ChipId|KeyId|KeyLock|R_(E)|EncryptedKey) #Generate Signature SIG_(L)

GenerateSignature(KeyRef,data,Null,Null) # Refer to Figure 374. # Checkif the key slot is unlocked If (KeyLock # unlock) ResultFlag

KeyAlreadyLocked Output ResultFlag Return EndIf #Test SIG_(E) If(SIG_(L) # SIG_(E)) ResultFlag

BadSig Output ResultFlag Return EndIf SIG_(L)

GenerateSignature (Key,null,R_(E),R_(L)) Advance R_(L) # Must beatomic - must not be possible to remove power and have KeyId and KeyNummismatched. Also preferable for KeyLock, although not strictly required.K_(KeyNum)

SIG_(L) ⊕ EncryptedKey KeyId_(KeyNum)

KeyId KeyLock_(KeyNum)

KeyLock ResultFlag

Pass Output ResultFlag Return

[7058] 23 SignM

[7059] Input: KeyRef, FieldSelect, FieldValLength, FieldVal, ChipId,R_(E)

[7060] Output: ResultFlag, R_(L), SIG_(out)

[7061] Changes: R_(L)

[7062] Availability: Trusted device only

[7063] 23.1 Function Description

[7064] The SignM function is used to generate the appropriate digitalsignature required for the authenticated write function WriteFieldsAuth.The SignM function is used whenever the caller wants to write a newvalue to a field that requires key-based write access.

[7065] The caller typically passes the new field value as input to theSignM function, together with the nonce (R_(E)) from the QA Device whowill receive the generated signature. The SignM function then producesthe appropriate signature SIG_(out). Note that SIG_(out) may need to betranslated via the Translate function on its way to the finalWriteFieldsAuth QA Device.

[7066] The SignM function is typically used by the system to updatepreauthorisation fields (Section 31.4.3).

[7067] The key used to produce output signature SIG_(out) depends onwhether the trusted device shares a common key or a variant key with theQA Device directly receiving the signature. The KeyRef object passedinto the interface must be set appropriately to reflect this.

[7068] 23.2 Input Parameters

[7069] Table 279 describes each of the input parameters for SignM.Parameter Description KeyRef For generating common key output signature:Ref.keyNum = Slot number of the key for producing the output signature.SIG_(out) produced using K_(KeyRef.keyNum) because the device receivingSIG_(out) shares K_(KeyRef.keyNum) with the trusted device.KeyRef.useChipId = 0 For generating variant key output signature:KeyRef.keyNum = Slot number of the key to be used for generating thevariant key. SIG_(out) produced using a variant of K_(KeyRef.keyNum)because the device receiving SIG_(out) shares a variant ofK_(KeyRef.keyNum) with the trusted device. KeyRef.useChipId = 1KeyRef.chipId = ChipId of the device which receives SIG_(out). FieldNumField number of the field that will be written to. FieldData The lengthof the FieldData in words. Length FieldData The value that will bewritten to the field selected by FieldNum. R_(E) External random valueused in the output signature generation. R_(E) is obtained by callingthe Random function on the device, which will receive the SIG_(out) fromthe SignM function, which in this case is the WriteAuth function or theTranslate function. ChipId Chip identifier of the device whose WriteAuthfunction will be called subsequently to perform an authenticated writeto its FieldNum of M0.

[7070] 23.3 Output Parameters

[7071] Table 280 describes each of the output parameters. TABLE 280Description of output parameters for SignM Parameter DescriptionResultFlag Indicates whether the function completed successfully or not.If it did not complete successfully, the reason for the failure isreturned here. See Section 12.1. R_(L) Internal random value used in theoutput signature. SIG_(out) SIG_(out) = SIG_(KeyRef)(data | R_(L) |R_(E)) as shown in FIG. 373. As per FIG. 373, R_(E) is actually R_(L)and R_(L) is R_(E) with respect to device producing SIG_(out) to beapplied to WriteAuth function.

[7072] 23.3.1 SIG_(out)

[7073] Refer to Section 20.2.1.

[7074] 23.4 Function Sequence

[7075] The SignM command is illustrated by the following pseudocode:Accept input parameters - KeyRef, FieldNum, FieldDataLength # AcceptFieldData words For i = 0 to FieldValLength Accept next FieldData EndForAccept ChipId, R_(E) Check KeyRef.keyNum range If invalid ResultFlag

InvalidKey Output ResultFlag Return EndIf #Generate message for passinginto the GenerateSignature function data

(RWSense|FieldSelect|ChipId|FieldVal) #Generate Signature SIG_(out)

GenerateSignature(KeyRef,data,R_(L),R_(E)) # Refer to Section 20.2.1.Advance R_(L)to R_(L2) ResultFlag

Pass Output parameters ResultFlag, R_(L),SIG_(out) Return

[7076] FUNCTIONS ON A

[7077] KEY PROGRAMMING QA DEVICE

[7078] 24 Concepts

[7079] The key programming device is used to replace keys in otherdevices.

[7080] The key programming device stores both the old key which will bereplaced in the device being programmed, and the new key which willreplace the old key in the device being programmed. The keys reside innormal key slots of the key programming device.

[7081] Any key stored in the key programming device can be used as anold key or a new key for the device being programmed, provided it ispermitted by the key replacement map stored within the key programmingdevice.

[7082]FIG. 375 is representation of a key replacement map. The 1 sindicates that the new key is permitted to replace the old key. The 0sindicates that key replacement is not permitted for those positions. Thepositions in FIG. 13 which are blank indicate a 0.

[7083] According to the key replacement map in FIG. 13, K₅ can replaceK₁, K₆ can replace K₃, K₄, K₅,K₇, K₃ can replace K₂, K₀ can replace K₂,and K₂ can replace K₆. No key can replace itself.

[7084]FIG. 375._Key Replacement Map

[7085] The key replacement map must be readable from an external systemand must be updateable by an authenticated write. Therefore, the keyreplacement map must be stored in an M0 field. This requires one of thekeys residing in the key programming device to be have ReadWrite accessto the key replacement map. This key is referred to as the keyreplacement map key and is used to update the key replacement map.

[7086] There will one key replacement map field in a key programmingdevice.

[7087] No key replacement mappings are allowed to the key replacementmap key because it should not be used in another device beingprogrammed. To prevent the key replacement map key from being used inkey replacement, in case the mapping has been accidentally changed, thekey replacement map key is allocated a fixed key slot of 0 in all keyprogramming devices. If a GetProgram function is invoked on the keyprogramming device with the key replacement map key slot number 0 itimmediately returns an error, even before the key replacement map ischecked.

[7088] The keys K₀ to K₇ in the key programming device are initially setduring the instantiation of the key programming device. Thereafter, anykey can be replaced on the key programming device by another keyprogramming device If a key in a key slot of the key programming deviceis being replaced, the key replacement map for the old key must beinvalidated automatically. This is done by setting the row and columnfor the corresponding key slot to 0 For example, if K₁ is replaced, thencolumn 1 and row 1 are set to 0, as indicated in FIG. 376.

[7089] The new mapping information for K, is then entered by performingan authenticated write of the key replacement map field using the keyreplacement map key.

[7090] 24.1 Key Replacement Map Data Structure

[7091] As mentioned in Section 24, the key replacement map must bereadable by external systems and must be updateable using anauthenticated write by the key replacement map key. Therefore, the keyreplacement map is stored in an M0 field of the key programming device.The map is 8×8 bits in size and therefore can be stored in a two wordfield. The LSW of key replacement map stores the mappings for K₀-K₃. TheMSW of key replacement map stores the mappings for K₄-K₇. Referring toFIG. 375, key replacement map LSW is 0x40092000 and MSW is 0x40224040.Referring to FIG. 376, after K₁ is replaced in the key programmingdevice, the value of the key replacement map LSW is 0x40090000 and MSWis 0x40224040.

[7092] The key replacement map field has an M1 word representing itsattributes. The attribute setting for this field is specified in Table281. TABLE 281 Key replacement map attribute setting Attribute nameValue Explanation Type TYPE_KEY_MAP Indicates that the field value Referto represents a key replacement map. Appendix A. Only one such field perkey programming QA Device. KeyNum 0 Slot number of the key replacementmap key. NonAuthRW 0 No non authenticated writes is permitted. AuthRW 1Authenticated write is permitted. KeyPerms 0 No Decrement Onlypermission for any key. EndPos Value such that field size is 2 words

[7093] 24.2 Basic Scheme

[7094] The Key Replacement sequence is shown FIG. 377.

[7095] Following is a sequential description of the transfer androllback process:

[7096] 1. The System gets a Random number from the QA Device whose keysare going to be replaced.

[7097] 2. The System makes a GetProgramKey Request to the KeyProgramming QA Device. The Key Programming QA Device must contain bothkeys for QA Device whose keys are being replaced—Old Keys which are thekeys that exist currently (before key replacement), and the New Keyswhich are the keys which the QA Device will have after a successfulprocessing of the ReaplceKey Request. The GetProgramKey Request iscalled with the Key number of the Old Key (in the Key Programming QADevice) and the Key Number of the New Key (in the Key Programming QADevice), and the Random number from (1). The Key Programming QA Devicevalidates the GetProgramKey Request based on the KeyReplacement map, andthen produces the necessary GetProgramKey Output. The GetProgramKeyOutput consists of the encrypted New Key (encryption done using the OldKey), along with a signature using the Old Key.

[7098] 3. The System then applies GetProgramKey Output to the QA Devicewhose key is being replaced, by calling the ReplaceKey function on it,passing in the GetProgramKey Output. The ReplaceKey function willdecrypt the encrypted New Key using the Old Key, and then replace itsOld Key with the decrypted New Key.

[7099] 25 Functions

[7100] 25.1 GetProgamKey

[7101] Input: OldKeyRef, ChipId, R_(E), KeyLock, NewKeyRef

[7102] Output: ResultFlag, R_(E),EncryptedKey, KeyIdOfNewKey, SIG_(out)

[7103] Changes: R_(L)

[7104] Availability: Key programming device

[7105] 25.1.1 Function Description

[7106] The GetProgramKey works in conjunction with the ReplaceKeycommand, and is used to replace the specified key and its KeyId. Thisfunction is available on a key programming device and produces thenecessary inputs for the ReplaceKey function. The ReplaceKey command isthen run on the device whose key is being replaced.

[7107] The key programming device must have both the old key and the newkey programmed as its keys, and the key replacement map stored in one ofits M0 field, before GetProgramKey can be called on the device.

[7108] Depending on the OldKeyRef object and the NewKeyRef object passedin, the GetProgramKey will produce a signature to replace a common keyby a common key, a variant key by a common key, a common key by avariant key or a variant key by a variant key.

[7109] 25.1.2 Input Parameters

[7110] Table 282 describes each of the input parameters forGetProgramKey. Parameter Description OldKeyRef Old key is a common key:OldKeyRef.keyNum = Slot number of the old key in the Key Programming QADevice. The device whose key is being replaced, shares a common keyK_(OldKeyRef.keyNum) with the key programming device.OldKeyRef.useChipId = 0 Old key is a variant key KeyRef.keyNum = Slotnumber of the old keyin the Key Programming QA Device. that will be usedto generate the variant key. The device whose key is being replaced,shares a variant of K_(OldKeyRef.keyNum) with the key programmingdevice. OldKeyRef.useChipId = 1 OldKeyRef.chipId = ChipId of the devicewhose variant of K_(OldKeyRef.keyNum) key is being replaced. ChipId Chipidentifier of the device whose key is being replaced. RE External randomvalue which will be used in output signature generation. R_(E) isobtained by calling the Random function on the device being programmed.This will also receive the SIGout from the GetProgramKey function.SIGout is passed in to ReplaceKeyfunction. KeyLock Flag indicatingwhether the new key should be unlocked/locked into its slot. NewKeyRefNew key is a common key: NewKeyRef.keyNum = Slot number of the new keyinthe Key Programming QA Device. The device whose key is being replaced,will receive a common key K _(NewKeyRef.keyNum) from the key programming device. NewKeyRef.useChipId = 0 NewKey is a variant key:NewKeyRef.keyNum = Slot number of the new key in the KeyProgramming QADevice. that will be used to generate the new variant key. The devicewhose key is being replaced, will receive a new key which is a variantof K_(NewKeyRef.keyNum) from the key programming device.NewKeyRef.useChipId = 1 NewKeyRef.chipId = ChipId of the devicereceiving a new key, the new key is a variant of theK_(NewKeyRef.keyNum).

[7111] 25.1.3 Output Parameters

[7112] Table 283 describes each of the output parameters forGetProgramKey. Parameter Description ResultFlag Indicates whether thefunction completed successfully or not. If it did not completesuccessfully, the reason for the failure is returned here. See Section12.1 and Table 284 R_(L) Internal random value used in the outputsignature. EncryptedKey SIG_(Kold)(R_(L)|R_(E)) ⊕ K_(new) KeyIdOfNewKeyKeyId of the new key. The LSB represents whether the new key is avariant or a common key. SIG_(out) SIG_(out) = SIG_(Kold)(data | R_(L) |R_(E))

[7113] TABLE 284 ResultFlag definitions for GetProgramKey Result FlagDescription InvalidKeyReplacementMap Key replacement map field invalidor doesn't exist. KeyReplacementNotAllowed Key replacement not allowedas per key replacement map.

[7114] 25.1.3.1 SIG_(out)

[7115]FIG. 378 shows the output signature generation data format for theGetProgramKey function.

[7116] 25.1.4 Function Sequence

[7117] The GetProgramKey command is illustrated by the followingpseudocode: Accept input parameters - OldKeyRef, ChipId, R_(E), KeyLock,NewKeyRef---------------------------------------------------------------- # keyreplacement map key stored in K0, must not be used for key replacementIf(OldKeyRef.keyNum = 0)

(NewKeyRef.keyNum = 0) ResultFlag

 Fail Output ResultFlag Return EndIf----------------------------------------------------------------CheckRange(OldKeyRef.keyNum) If invalid ResultFlag

 InvalidKey Output ResultFlag Return EndIf----------------------------------------------------------------CheckRange (NewKeyRef.keyNum) If invalid ResultFlag

 InvalidKey Output ResultFlag Return EndIf---------------------------------------------------------------- # FindM0 words that represent the key replacement map WordSelectForKeyMapField

GetWordSelectForKeyMapField(M1) If(WordSelectForKeyMapField = 0)ResultFlag

 InvalidKeyReplacementMap Output ResultFlag Return EndIf----------------------------------------------------------------#CheckMapPermits key replacement ReplaceOK

CheckMapPermits(WordSelectForKeyMapField,OldKeyNum, NewKeyNum)If(ReplaceOK = 0) ResultFlag

 KeyReplacementNotAllowed Output ResultFlag Return EndIf---------------------------------------------------------------- #Allchecks are OK, now generate Signature with OldKey SIG_(L)

 GenerateSignature(OldKeyRef,null,R_(L),R_(E)) #Get new key K_(NewKey)

NewKeyRef.getKey( ) #Generate Encrypted Key EncryptedKey

SIG_(L) ⊕ K_(NewKey) #Set base key or variant key - bit 0 of KeyIdIf(NewKeyRef.useChipId = 1) KeyId

0x0001

0x0001 Else KeyId

0x0001

0x0000 EndIf #Set the new key KeyId to the KeyId - bits 1-30 of KeyIdKeyIdOfNewKey

SHIFTLEFT(KeyIdOfNewKey,1) KeyId

KeyId

 KeyIdOfNewKey #Set the KeyLock as per input - bit 31 of KeyId KeyLock

SHIFTLEFT(KeyLock,31) #KeyId

KeyId

KeyLock #Generate message for passing in to the GenerateSignaturefunction data

 ChipId|KeyId|R_(L)|EncryptedKey #Generate output signature SIG_(out)

 GenerateSignature(OldKeyRef,data,null,null) # Refer to Figure 378Advance R_(L)to R_(L2) ResultFlag

 Pass Output ResultFlag, R_(L),SIG_(out),KeyId, EncryptedKey Return

[7118] 25.1.4.1 WordSelectForField GetWordSelectForKeyMapField(M1)

[7119] This function gets the words corresponding to the key replacementmap in M0. FieldSize[16]

0 # Array to hold FieldSize assuming there are 16 fields NumFields

FindNumberOfFieldsInM0(M1,FieldSize) #Find the key replacement map fieldFor i

0 to NumFields If(TYPE_KEY_MAP = M1[i].Type) # Field is key map fieldMapFieldNum

i Return Endif EndFor #Get the words corresponding to the keyreplacement map WordMapForField

GetWordMapForField(MapFieldNum,M1) Return WordSelectForField

[7120] 25.1.4.2 NumFields FindNumOfFieldsInM0(M1, FieldSize

)

[7121] Refer to Figure 19.4.1 for details

[7122] 25.1.4.3 WordMapForField GetWordMapForField(FieldNum,M1)

[7123] Refer to Section 19.4.2 for details.

[7124] 25.1.4.4 ReplaceOK CheckMapPermits(WordSelectForKeyMapField,OldKeyNum, NewKeyNum,M0)

[7125] This function checks whether key replacement map permits keyreplacement. #Isolate KeyReplacementMap based onWordSelectForKeyMapField and  M0 KeyReplacementMap[64 bit]  #Isolatepermission bit corresponding for NewKeyNum in the map for  OldKeyNm ReplaceOK

KeyReplacementMap[(OldKeyNum × 8 + NewKeyNum) bit]  Return ReplaceOK

[7126] 25.2 ReplaceKey

[7127] Input: KeyRef, KeyId, KeyLock, EncryptedKey,R_(E), SIG_(E)

[7128] Output: ResultFlag

[7129] Changes: K_(KeyNum) and R_(L)

[7130] Availability: Key programming device

[7131] 25.2.1 Function Description

[7132] This function is used for replacing a key in a key programmingdevice and is similar to the generic ReplaceKey function(Refer toSection 24), with an additional step of setting the KeyRef.keyNum columnand KeyRef.keyNum row key replacement map to 0.

[7133] 25.2.2 Input Parameters

[7134] Refer to Section 22.

[7135] 25.2.3 Output Parameters

[7136] Refer to Section 22.

[7137] 25.2.4 Function Sequence

[7138] The ReplaceKey command is illustrated by the followingpseudocode: Accept input parameters - KeyRef, KeyId, EncryptedKey,R_(E),SIG_(E) #Generate message for passing into GenerateSignature functiondata

(ChipId|KeyId|R_(E)|EncryptedKey)# Refer to Figure 374.---------------------------------------------------------------- #Validate KeyRef, and then verify signature ResultFlag =ValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L)) If (ResultFlag ≠Pass) Output ResultFlag Return EndIf---------------------------------------------------------------- # Checkif the key slot is unlocked Isolate KeyLock for KeyRef If(KeyLock =lock) ResultFlag

KeyAlreadyLocked Output ResultFlag Return EndIf SIG_(L)

GenerateSignature(Key,Null,R_(E),R_(L)) Advance R_(L) # Find M0 wordsthat represent the key replacement map WordSelectForKeyMapField

GetWordSelectForKeyMapField(M1) # Set the bits corresponding to theKeyRef.keyNum row and column to 0 # i.e invalidate the key replacementmap for KeyRef.keyNum. #Must be done before the key is replaced and mustbe atomic with key replacement. SetFlag

SetKeyMapForKeyNum(WordSelectForKeyMapField, KeyRef.keyNum, M0)If(SetFlag = 1) # Must be atomic - must not be possible to remove powerand have KeyId and KeyNum mismatched K_(KeyNum)

SIG_(L) ⊕ EncryptedKey KeyId_(KeyNum)

KeyId KeyLock_(KeyNum)

KeyLock ResultFlag

Pass Else ResultFlag

Fail EndIf Output ResultFlag Return

[7139] 25.2.4.1 WordSelectForField GetWordSelectForKeyMapField(M1)

[7140] Refer to Figure 25.1.4.1 for details.

[7141] 25.2.4.2 SetFlagSetKeyMapForKeyNum(WordSelectForKeyMapField,KeyNum, M0)

[7142] This function invalidates the key replacement map for KeyNum.#Isolate KeyReplacementMap based on WordSelectForKeyMapField and M0KeyReplacementMap[64 bit] # Set KeyNum row (all bits) to 0 in theKeyReplacementMap For i = 0 to 7 KeyReplacementMap[(KeyNum × 8 + i)bit]

0 EndFor # Set KeyNum column to 0 in the KeyReplacementMap For i = 0 to7 KeyReplacementMap[(i×8 + KeyNum)bit]

0 EndFor SetFlag

1 Return SetFlag

[7143] Functions

[7144] Upgrade Device

[7145] (Ink Re/fill)

[7146] 26 Concepts

[7147] 26.1 Purpose

[7148] In a printing application, an ink cartridge contains an Ink QADevice storing the ink-remaining values for that ink cartridge. Theink-remaining values decrement as the ink cartridge is used to print.

[7149] When an ink cartridge is physically re/filled, the Ink QA Deviceneeds to be logically re/filled as well. Therefore, the main purpose ofan upgrade is to re/fill the ink-remaining values of an Ink QA Device inan authorised manner.

[7150] The authorisation for a re/fill is achieved by using a ValueUpgrader QA Device which contains all the necessary functions tore/write to the Ink QA Device. In this case, the value upgrader iscalled an Ink Refill QA Device, which is used to fill/refill ink amountin an Ink QA Device.

[7151] When an Ink Refill QA Device increases (additive) the amount ofink-remaining in an Ink QA Device, the amount of ink-remaining in theInk Refill QA Device is correspondingly decreased. This means that theInk Refill QA Device can only pass on whatever ink-remaining value ititself has been issued with. Thus an Ink Refill QA Device can itself bereplenished or topped up by another Ink Refill QA Device.

[7152] The Ink Refill QA Device can also be referred to as the UpgradingQA Device, and the Ink QA Device can also be referred to as the QADevice being upgraded.

[7153] The refill of ink can also be referred to as a transfer of ink,or transfer of amount/valu, or an upgrade.

[7154] Typically, the logical transfer of ink is done only after aphysical transfer of ink is successful.

[7155] 26.2 Requirements

[7156] The transfer process has two basic requirements:

[7157] The transfer can only be performed if the transfer request isvalid. The validity of the transfer request must be completely checkedby the Ink Refill QA Device, before it produces the required output forthe transfer. It must not be possible to apply the transfer output tothe Ink QA Device, if the Ink Refill QA Device has been already beenrolled back for that particular transfer.

[7158] A process of rollback is available if the transfer was notreceived by the Ink QA Device. A rollback is performed only if therollback request is valid. The validity of the rollback request must becompletely checked by the Ink Refill QA Device, before it adjusts itsvalue to a previous value before the transfer request was issued. Itmust not be possible to rollback an Ink Refill QA Device for a transferwhich has already been applied to the Ink QA Device i.e the Ink RefillQA Device must only be rolled back for transfers that have actuallyfailed.

[7159] 26.3 Basic Scheme

[7160] The transfer and rollback process is shown in FIG. 379.

[7161] Following is a sequential description of the transfer androllback process:

[7162] 1. The System Reads the memory vectors M0 and M1 of the Ink QADevice. The output from the read which includes the M0 and M1 words ofthe Ink QA Device, and a signature, is passed as an input to theTransfer Request. It is essential that M0 and M1 are read together. Thisensures that the field information for M0 fields are correct, and havenot been modified, or substituted from another device. Entire M0 and M1must be read to verify the correctness of the subsequent TransferRequest by the Ink Refill QA Device.

[7163] 2. The System makes a Transfer Request to the Ink Refill QADevice with the amount that must be transferred, the field in the InkRefill QA Device the amount must be transferred from, and the field inInk QA Device the amount must be transferred to. The Transfer Requestalso includes the output from Read of the Ink QA Device. The Ink RefillQA Device validates the Transfer Request based on the Read output,checks that it has enough value for a successful transfer, and thenproduces the necessary Transfer Output. The Transfer Output typicallyconsists of new field data for the field being refilled or upgraded,additional field data required to ensure the correctness of thetransfer/rollback, along with a signature.

[7164] 3. The System then applies the Transfer Output to the Ink QADevice, by calling an authenticated Write function on it, passing in theTransfer Output. The Write is either successful or not. If the Write isnot successful, then the System will repeat calling the Write functionusing the same transfer output, which may be successful or not. Ifunsuccesful the System will initiate a rollback of the transfer. Therollback must be performed on the Ink Refill QA Device, so that it canadjust its value to a previous value before the current Transfer Requestwas initiated. It is not necessary to perform a rollback immediatelyafter a failed Transfer. The Ink QA Device can still be used to print,if there is any ink remaining in it.

[7165] 4. The System starts a rollback by Reading the memory vectors M0and M1 of the Ink QA Device.

[7166] 5. The System makes a StartRollBack Request to the Ink Refill QADevice with same input parameters as the Transfer Request, and theoutput from Read in (4). The Ink Refill QA Device validates theStartRollBack Request based on the Read output, and then produces thenecessary Pre-rollback output. The Pre-rollback output consists only ofadditional field data along with a signature.

[7167] 6. The System then applies the Pre-rollback Output to the Ink QADevice, by calling an authenticated Write function on it, passing in thePre-rollback output. The Write is either successful or not. If the Writeis not successful, then either (6), or (5) and (6) must be repeated.

[7168] 7. The System then Reads the memory vectors M0 and M1 of the InkQA Device.

[7169] 8. The System makes a RollBack Request to the Ink Refill QADevice with same input parameters as the Transfer Request, and theoutput from Read (7). The Ink Refill QA Device validates the RollBackRequest based on the Read output, and then rolls back its fieldcorresponding to the transfer.

[7170] 26.3.1 Transfer

[7171] As we mentioned, the Ink QA Device stores ink-remaining values inits M0 fields, and its corresponding M₁ words contains field informationfor its ink-remaining fields. The field information consists of the sizeof the field, the type of data stored in field and the access permissionto the field. See Section 8.1.1 for details.

[7172] The Ink Refill QA Device also stores its ink-remaining values inits M0 fields, and its coressponding M₁ words contains field informationfor its ink-remaining fields.

[7173] 26.3.1.1 Authorisation

[7174] The basic authorisation for a transfer comes from a key, whichhas authenticated ReadWrite permission (stored in field information asKeyNum) to the ink-remaining field (to which ink will be transferrred)in the Ink QA Device. We will refer to this key as the refill key. Therefill key must also have authenticated decrement-only permission forthe ink-remaining field (from which ink will be transferred) in the InkRefill QA Device.

[7175] After validating the input transfer request, the Ink Refill QADevice will decrement the amount to be transferred from itsink-remaining field, and produce a transfer amount (previousink-remaining amount in the Ink QA Device+transfer amount), additionalfield data, and a signature using the refill key. Note that the InkRefill QA Device can decrement its ink-remaining field only if therefill key has the permission to decrement it.

[7176] The signature produced by the Ink Refill QA Device issubsequently applied to the Ink QA Device. The Ink QA Device will acceptthe transfer amount only if the signature is valid. Note that thesignature will only be valid if it was produced using the refill keywhich has write permission to the ink-remaining field being written.

[7177] 26.3.1.2 Data Type Matching

[7178] The Ink Refill QA Device validates the transfer request bymatching the Type of the data in ink-remaining information field of InkQA Device to the Type of data in ink-remaining information field of theInk Refill QA Device. This ensures that equivalent data Types aretransferred i.e Network_OEM1_infrared ink is not transferred toNetwork_OEM1_cyan ink.

[7179] 26.3.1.3 Addition Validation

[7180] Additional validation of the transfer request must also beperformed before a transfer output is generated by the Ink Refill QADevice. These are as follows:

[7181] For the Ink Refill QA Device:

[7182] 1. Whether the field being upgraded is actually present.

[7183] 2. Whether the field being upgraded can hold the upgraded amount.

[7184] For the Ink QA Device:

[7185] 1. Whether the field from which the amount is transferred isactually present.

[7186] 2. Whether the field has sufficient amount required for thetransfer.

[7187] 26.3.1.4 Rollback Facilitation

[7188] To facilitate a rollback, the Ink Refill QA Device will store alist of transfer requests processed by it. This list is referred to asthe Xfer Entry cache. Each record in the list consists of the transferparameters corresponding to the transfer request.

[7189] 26.3.2 Rollback

[7190] A rollback request is validated by looking through the Xfer Entryof the Ink Refill QA Device and finding the request that should berolled back. After the right transfer request is found the Ink Refill QADevice checks that the output from the transfer request was not appliedto the Ink QA Device by comparing the current Read of the Ink QA Deviceto the values in the Xfer Entry cache, and finally rolls back itsink-remaining field (from which the ink was transferred) to a previousvalue before the transfer request was issued.

[7191] The Ink Refill QA Device must be absolutely sure that the Ink QADevice didn't receive the transfer. This factor determines theadditional fields that must be written along with transfer amount, andalso the parameters of the transfer request that must be stored in theXfer Entry cache to facilitate a rollback, to prove that the Printer QADevice didn't actually receive the transfer.

[7192] 26.3.2.1 Sequence Fields

[7193] The rollback process must ensure that the transfer output (whichwas previously produced) for which the rollback is being performed,cannot be applied after the rollback has been performed. How do weachieve this? There are two separate decrement-only sequence fields(SEQ_(—)1 and SEQ_(—)2) in the Ink QA Device which can only bedecremented by the Ink Refill QA Device using the refill key. The natureof data to be written to the sequence fields is such that either thetransfer output or the pre-rollback output can be applied to the Ink QADevice, but not both i.e they must be mutually exclusive.Refer to Table285 for details. TABLE 285 Sequence field data for Transfer andPre-rollback Sequence Field data written to Ink QA Device Function SEQ_1SEQ_2 Explanation Initialised 0xFFFFFFFF 0xFFFFFFFF Written using thesequence key which is different from the refill key Write using(Previous Value − 2) (Previous Value − 1) Written using the refill keyusing Transfer If Previous Value = intialised If Previous Value = therefill key which has Output value intialised value decrement-only then0xFFFFFFFD then 0xFFFFFFFE permission on the fields. Value cannot bewritten if pre- rollback output is already written. Write usiing(Previous Value − 1) (Previous Value − 2) Written using the refill keyusing Pre-rollback If Previous Value = intialised If Previous Value =the refill key which has value initialised value decrement-only then0xFFFFFFFE then 0xFFFFFFFD permissionon the fields. Value can be writtenonly if Transfer Output has not been written.

[7194] The two sequence fields are initialised to 1xFFFFFFFF usingsequence key. The sequence key is different to the refill key, and hasauthenticated ReadWrite permission to both the sequence fields. Thetransfer output consists of the new data for the field being upgraded,field data of the two sequence fields, and a signature using the refillkey. The field data for SEQ_(—)1 is decremented by 2 from the originalvalue that was passed in with the transfer request. The field data forSEQ_(—)2 is decremented by 1 from the original value that was passed inwith the transfer request.

[7195] The pre-rollback output consists only of the field data of thetwo sequence fields, and a signature using the refill key. The fielddata for SEQ_(—)1 is decremented by 1 from the original value that waspassed in with the transfer request. The field data for SEQ_(—)2 isdecremented by 2 from the original value that was passed in with thetransfer request.

[7196] Since the two sequence fields are decrement-only fields, thewriting of the transfer output to QA Device being upgraded will preventthe writing of the pre-rollback output to QA Device being upgraded. Ifthe writing of the transfer output fails, then pre-rollback can bewritten. However, the transfer output cannot be written after thepre-rollback has been written.

[7197] Before a rollback is performed, the Ink Refill QA Device mustconfirm that the sequence fields was successfully written to thepre-rollback values in the Ink QA Device. Because the sequence fieldsare Decrement-Only fields, the Ink QA Device will allow pre-rollbackoutput to be written only if the upgrade output has not been written. Italso means that the transfer output cannot be written after thepre-rollback values have been written.

[7198] 26.3.2.1.1 Field Information of the Sequence Data Field

[7199] For a device to be upgradeable the device must have two sequencefields SEQ_(—)1 and SEQ_(—)2 which are written with sequence data duringthe transfer sequence. Thus all upgrading QA devices, ink QA Devices andprinter QA Devices must have two sequence fields. The upgrading QADevices must also have these fields because they can be upgraded aswell.

[7200] The sequence field information is defined in Table 286. TABLE 286Sequence field information Attribute Name Value Explanation TypeTYPE_SEQ_1 or See Appendix A for exact TYPE_SEQ_2. value. KeyNum Slotnumber of Only the sequence key has the sequence authenticated ReadWritekey. access to this field. Non Auth 0 Non authenticated ReadWrite RWPerm is not allowed to the field. Auth RW 1 Authenticated (key based)Perm ReadWrite access is allowed to the field. KeyPerm KeyPerms KeyNumis the slot number [KeyNum] = 0 of the sequence key, which has ReadWritepermission to the field. KeyPerms Refill key can decrement [Slot numberof the sequence field. the refill key] = 1 KeyPerms[others = All otherkeys have 0 . . . 7(except ReadOnly access. refill key)] = 0 End Pos Setas required. Size is typically 1 word.

[7201] 26.3.3 Upgrade States

[7202] There are three states in an transfer sequence, the first stateis initiated for every transfer, while the next two states are initiatedonly when the transfer fails. The states are—Xfer, StartRollback, andRollback.

[7203] 26.3.3.1 Upgrade Flow

[7204]FIG. 380 shows a typical upgrade flow.

[7205] 26.3.3.2 Xfer

[7206] This state indicates the start of the transfer process, and isthe only state required if the transfer is successful. During thisstate, the Ink Refill QA Device adds a new record to its Xfer Entrycache, decrements its amount, produces new amount, new sequence data (asdescribed in Section 26.3.2.1) and a signature based on the refill key.

[7207] The Ink QA Device will subsequently write the new amount and newsequence data, after verifying the signature. If the new amount can besuccessfully written to the Ink QA Device, then this will finish asuccessful transfer.

[7208] If the writing of the new amount is unsuccessful (result returnedis BAD SIG), the System will re-transmit the transfer output to the InkQA Device, by calling the authenticated Write function on it again,using the same transfer output.

[7209] If retrying to write the same transfer output fails repeatedly,the System will start the rollback process on Ink Refill QA Device, bycalling the Read function on the Ink QA Device, and subsequently callingthe StartRollBack function on the Ink Refill QA Device. After asuccessful rollback is performed, the System will invoke the transfersequence again.

[7210] 26.3.3.3 StartRollBack

[7211] This state indicates the start of the rollback process. Duringthis state, the Ink Refill QA Device produces the next sequence data anda signature based on the refill key. This is also called a pre-rollback,as described in Section 26.3.2.

[7212] The pre-rollback output can only be written to the Ink QA Device,if the previous transfer output has not been written. The writing of thepre-rollback sequence data also ensures, that if the previous transferoutput was captured and not applied, then it cannot be applied to theInk QA Device in the future.

[7213] If the writing of the pre-rollback output is unsuccessful (resultreturned is BAD SIG), the System will re-transmit the pre-rollbackoutput to the Ink QA Device, by calling the authenticated Write functionon it again, using the same pre-rollback output.

[7214] If retrying to write the same pre-rollback output failsrepeatedly, the System will call the StartRollback on the Ink Refill QADevice again, and subsequently calling the authenticated Write functionon the Ink QA Device using this output.

[7215] 26.3.3.4 Rollback

[7216] This state indicates a successful deletion (completion) of atransfer sequence. During this state, the Ink Refill QA Device verifiesthe sequence data produced from StartRollBack has been correctly writtento Ink Refill QA Device, then rolls its ink-remaining field to aprevious value before the transfer request was issued.

[7217] 26.3.4 Xfer Entry Cache

[7218] The Xfer Entry data structure must allow for the following:

[7219] Stores the transfer state and sequence data for a given transfersequence.

[7220] Store all data corresponding to a given transfer, to facilitate arollback to the previous value before the transfer output was generated.

[7221] The Xfer Entry cache depth will depend on the QA Chip LogicalInterface implementation. For some implementations a single Xfer Entryvalue will be saved. If the Ink Refill QA Device has no powersafestorage of Xfer Entry cache, a power down will cause the erasure of theXfer Entry cache and the Ink Refill QA Device will not be able torollback to a pre-power-down value.

[7222] A dataset in the Xfer Entry cache will consist of the following:

[7223] Information about the QA Device being upgraded:

[7224] a. ChipId of the device.

[7225] b. FieldNum of the M0 field (i.e what was being upgraded).

[7226] Information about the upgrading QA Device:

[7227] a. FieldNum of the M0 field used to transfer the amount from.

[7228] XferVal—the transfer amount.

[7229] Xfer State—indicating at which state the transfer sequence is.This will consist of:

[7230] a. State definition which could be one of the following: —Xfer,StartRollBack and complete/deleted.

[7231] b. The value of sequence data fields SEQ_(—)1 and SEQ_(—)2.

[7232] 26.3.4.1 Adding New Dataset

[7233] A new dataset is added to Xfer Entry cache by the Xfer function.

[7234] There are three methods which can be used to add new dataset tothe Xfer Entry cache. The methods have been listed below in the order oftheir priority:

[7235] 1. Replacing existing dataset in Xfer Entry cache with newdataset based on ChipId and FieldNum of the Ink QA Device in the newdataset. A matching ChipId and FieldNum could be found because aprevious transfer output corresponding to the dataset stored in the XferEntry cache has been correctly received and processed by the Ink RefillQA Device, and a new transfer request for the same Ink QA Device, samefield, has come through to the Ink Refill QA Device.

[7236] 2. Replace existing dataset cache with new dataset based on theXfer State. If the Xfer State for a dataset indicates deleted(complete), then such a dataset will not be used for any furtherfunctions, and can be overwritten by a new dataset.

[7237] 3. Add new dataset to the end of the cache. This willautomatically delete the oldest dataset from the cache regardless of theXfer State.

[7238] 26.4 Different Types of Transfer

[7239] There can be three types of transfer:

[7240] Peer to Peer Transfer—This transfer could be one of the 2 typesdescribed below:

[7241] a. From an Ink Refill QA Device to a Ink QA Device. This isperformed when the Ink QA Device is refilled by the Ink Refill QADevice.

[7242] b. From one Ink Refill QA Device to another Ink Refill QA Device,where both QA Devices belong to the same OEM. This is typicallyperformed when OEM divides ink from one Ink Refill QA Device to anotherInk Refill QA Device, where both devices belong to the same OEM

[7243] Heirachical Transfer—This is a transfer from one Ink Refill QADevice to another Ink Refill QA Device, where the QA Devices belong todifferent organisation, say ComCo and OEM. This is typically performedwhen ComCo divides ink from its refill device to several refill devicesbelonging to several OEMs.

[7244]FIG. 381 is a representation of various authorised ink refillpaths in the printing system.

[7245] 26.4.1 Hierarchical Transfer

[7246] Referring to FIG. 381, this transfer is typically performed whenink is transferred from ComCo's Ink Refill QA Device to OEM's Ink RefillQA Device, or from QACo's Ink Refill QA Device to ComCo's Ink Refill QADevice.

[7247] 26.4.1.1 Keys and Access Permission

[7248] We will explain this using a transfer from ComCo to OEM.

[7249] There is an ink-remaining field associated with the ComCo's InkRefill QA Device. This ink-remaining field has two keys associated with:

[7250] The first key transfers ink to the device from another refilldevice (which is higher in the heirachy), fills/refills (upgrades) thedevice itself. This key has authenticated ReadWrite permission to thefield.

[7251] The second key transfers ink from it to other devices (which arelower in the heirachy), fills/refills (upgrades) other devices from it.This key has authenticated decrement-only permission to the field.

[7252] There is an ink-remaining field associated with the OEM's Inkrefill device. This ink-remaining field has a single key associatedwith:

[7253] This key transfers ink to the device from another refill device(which is higher or at the same level in the hierarchy), fills/refills(upgrades) the device itself, and additionally transfers ink from it toother devices (which are lower in the heirachy), fills/refills(upgrades) other devices from it. Therefore, this key has bothauthenticated ReadWrite and decrement-only permission to the field. Fora successful transfer ink from ComCo's refill device to an OEM's refilldevice, the ComCo's refill device and the OEM's refill device must sharea common key or a variant key. This key is fill/refill key with respectto the OEM's refill device and it is the transfer key with respect tothe ComCo's refill device.

[7254] For a ComCo to successfully fill/refill its refill device fromanother refill device (which is higher in the heirachy possiblybelonging to the QACo), the ComCo's refill device and the QACo's refilldevice must share a common key or a variant key. This key is fill/refillkey with respect to the ComCo's refill device and it is the transfer keywith respect to the QACo's refill device.

[7255] 26.4.1.1.1 Ink—Remaining Field Information

[7256] Table 287 shows the field information for an _(M0)field storinglogical ink-remaining amounts in the refill device and which has theability to transfer down the heirachy. Attribute Name Value ExplanationType For e.g - Type describing the logicalTYPE_HIGHQUALITY_BLACK_INK^(a) ink stored in the ink-remaining field inthe refill device. KeyNum Slot number of the refill Only the refill keyhas key. authenticated ReadWrite access to this field. Non Auth 0 Nonauthenticated ReadWrite RW Perm^(b) is not allowed to the field. Auth RW1 Authenticated (key based) Perm^(c) ReadWrite access is allowed to thefield. KeyPerm KeyPerms[KeyNum] = 0 KeyNum is the slot number of therefill key, which has ReadWrite permission to the field. KeyPermsTransfer key can decrement [Slot Num of the field. transfer key] = 1KeyPerms[others = All other keys have 0 . . . 7(except ReadOnly access.transfer key)] = 0 End Pos Set as required. Depends on the amount oflogical ink the device can store and storage resolution - i.e inpicolitres or in microlitres.

[7257] 26.4.2 Peer to Peer Transfer

[7258] Referring to FIG. 381, this transfer is typically performed whenink is transferred from OEM's Ink Refill Device to another Ink RefillDevice belonging to the same OEM, or OEM's Ink Refill Device to InkDevice belonging to the same OEM.

[7259] 26.4.2.1 Keys and Access Permission

[7260] There is an ink-remaining field associated with the refill devicewhich transfers ink amounts to other refill devices (peer devices), orto other ink devices. This ink-remaining field has a single keyassociated with:

[7261] This key transfers ink to the device from another refill device(which is higher or at the same level in the heirachy), fills/refills(upgrades) the device itself, and additionally transfers ink from it toother devices (which are lower in the heirachy), fills/refills(upgrades) other devices from it.

[7262] This key is referred to as the fill/refill key and is used forboth fill/refill and transfer. Hence, this key has both ReadWrite andDecrement-Only permission to the ink-remaining field in the refilldevice.

[7263] 26.4.2.1.1 Ink-Remaining Field Information

[7264] Table 288 shows the field information for an _(M0) field storinglogical ink-remaining amounts in the refill device with the ability totransfer between peers. Attribute Name Value Explanation Type For e.g -Type describing the logical TYPE_HIGHQUALITY_BLACK_INK^(a) ink stored inthe ink-remaining field in the refill device. KeyNum Slot number of therefill Only the refill key has key. authenticated ReadWrite access tothis field. Non Auth 0 Non authenticated ReadWrite RW Perm^(b) is notallowed to the field. Auth RW 1 Authenticated (key based) Perm^(c)ReadWrite access is allowed to the field. KeyPerm KeyPerms[KeyNum] = 1KeyNum is the slot number of the refill key, which has ReadWrite andDecrement permission to the field. KeyPerms[others = All other keys have0 . . . 7(except ReadOnly access. KeyNum)] = 0 End Pos Set as required.Depends on the amount of logical ink the device can store and storageresolution - i.e in picolitres or in microlitres.

[7265] 27 Functions

[7266] 27.1 XferAmount

[7267] Input: KeyRef, _(M0)OfExternal, _(M1)OfExternal, ChipId,FieldNumL, FieldNumE, XferValLength, XferVal, InputParameterCheck(optional), R_(E), SIG_(E), R_(E2)

[7268] Output: ResultFlag, FieldSelect, FieldVal, R_(L2), SIG_(out)

[7269] Changes: M₀ and R_(L)

[7270] Availability Ink refill QA Device

[7271] 27.1.1 Function Description

[7272] The XferAmount function produces data and signature for updatinga given _(M0) field. This data and signature when applied to theappropriate device through the WriteFieldsAuth function, will update the_(M0) field of the device.

[7273] The system calls the XferAmount function on the upgrade devicewith a certain XferVal, this XferVal is validated by the XferAmountfunction for various rules as described in Section 27.1.4, the functionthen produces the data and signature for the passing into theWriteFieldsAuth function for the device being upgraded.

[7274] The transfer amount output consists of the new data for the fieldbeing upgraded, field data of the two sequence fields, and a signatureusing the refill key. When a transfer output is produced, the sequencefield data in SEQ_(—)1 is decremented by 2 from the previous value (aspassed in with the input), and the sequence field data in SEQ 2 isdecremented by 1 from the previous value (as passed in with the input).

[7275] Additional InputParameterCheck value must be provided for theparameters not included in the SIG_(E), if the transmission between theSystem and Ink Refill QA Device is error prone, and these errors are notcorrected by the transimission protocol itself. InputParameterCheck isSHA-1[FieldNumL|FieldNumE|XferValLength|XferVal], and is required toensure the integrity of these parameters, when these inputs are receivedby the Ink Refill QA Device. This will prevent an incorrect transferamount being deducted.

[7276] The XferAmount function must first calculate theSHA-1[FieldNumL|FieldNumE|XferValLength|XferVal], compare the calculatedvalue to the value received (InputParameterCheck) and only if the valuesmatch act upon the inputs.

[7277] 27.1.2 Input Parameters

[7278] Table 289 describes each of the input parameters for XferAmountfunction. Parameter Description KeyRef For comsmon key input and outputsignature: KeyRef.keyNum = Slot number of the key to be used for testinginput signature and producing the output signature. SIG_(E) producedusing K_(KeyRef.keyNum) by the QA Device being upgraded. SIGout producedusing K_(KeyRef.keyNum) for delivery to the QA Device being upgraded.KeyRef.useChipId = 0 For variant key input and output signatures:KeyRef.keyNum = Slot number of the key to be used for generating thevariant key. SIG_(E) produced using a variant of K_(KeyRef.keyNum) bythe QA Device being upgraded. SIGout produced using a variant ofK_(KeyRef.keyNum) for delivery to the QA Device being upgraded.KeyRef.useChipId = 1 KeyRef.chipId = ChipId of the device whichgenerated SIG_(E) and will receive SIGout. _(M0)OfExternal All 16 wordsof _(M0) of the QA Device being upgraded. _(M1)OfExternal All 16 wordsof _(M1) of the QA Device being upgraded. ChipId ChipId of the QA Devicebeing upgraded. FieldNumL _(M0) field number of the local (refill)device from which the value will be transferred. FieldNumE _(M0) fieldnumber of the QA Device being upgraded to which the value will betransferred. XferValLength XferVal length in words. Non zero lengthrequired. XferVal The logical amount that will be transferred from thelocal device to the external device. R_(E) External random value used toverify input signature. This will be the R from the input signaturegenerator (i.e device generating SIG_(E)). The input signal generator inthis case, is the device being upgraded or a translation device. R_(E2)External random value used to produce output signature. This will be Robtained by calling the Random function on the device which will receivethe SIG_(out) from the XferAmount function. The device receiving theSIG_(out) in this case, is the device being upgraded or a translationdevice. SIG_(E) External signature required for authenticating inputdata. The input data in this case, is the output from the Read functionperformed on the device being upgraded. A correct SIG_(E) =SIG_(KeyRef)(Data | R_(E) | R_(L)).

[7279] 27.1.2.1 Input Signature Verification Data Format

[7280] The input signature passed in to the XferAmount function is theoutput signature from the Read function of the Ink QA Device.

[7281]FIG. 382 shows the input signature verification data format forthe XferAmount function.

[7282] Table 290 gives the parameters included in SIG_(E) forXferAmount. Length Value set Value set Parameter in bits internally fromInput RWSense 3 000 Refer to Section 15.3.1.1 MSelect 4 0011 KeyIdSelect8 00000000 ChipId 48 ChipId of the QA Device being upgraded WordSelect16 All bits for M₀ set to 1 WordSelect 16 All bits for M₁ set to 1 M0512 • M1 512 • R_(E) 160 • R_(L) 160 Based on the • internal R

[7283] The XferAmount function is not passed all the parameters requiredto generate SIG_(E). For producing SIG_(L) which is used to testSIG_(E), the function uses the expected values of some the parameters.

[7284] 27.1.3 Output Parameters

[7285] Table 291 describes each of the output parameters for XferAmount.Parameter Description ResultFlag Indicates whether the functioncompleted successfully or not. If it did not complete successfully, thereason for the failure is returned here. See Table 47. FieldSelectSelection of fields to be written In this case the bit corresponding toSEQ_1 , SEQ_2 and to FieldNumE are set to 1. All other bits are set to0. FieldVal Updated data words for Sequence data field and FieldNumE forQA Device being upgraded. Starts with LSW of lower field. This must bepassed as input to the WriteFieldsAuth function of the QA Device beingupgraded. R_(L2) Internal random value required to generate outputsignature. This must be passed as input to the WriteFieldsAuth functionor Translate function of the QA Device being upgraded. SIG_(out) Outputsignature which must be passed as an input to the WriteFieldsAuthfunction of the QA Device being upgraded. SIG_(out) = SIG_(KeyRef)(data| R_(L2) | R_(E2)) as per FIG. 373.

[7286] TABLE 292 Result Flag definitions for XferAmount ResultFlagDefinition Description FieldNumEInvalid FieldNum to which the amount isbeing transferred, or which is being upgraded in the QA Device beingupgraded is invalid. SeqFieldInvalid The sequence field for the QADevice being upgraded is invalid. FieldNumEWritePermInvalid FieldNum towhich the amount is being transferred, or which is being upgraded in theQA Device being upgraded has no authenti- cated write permission.FieldNumLInvalid FieldNum from which the amount is being transferred, orfrom which the value is being copied in the Upgrading QA Device isinvalid. FieldNumLWritePermInvalid FieldNum from which the amount isbeing transferred in the Upgrading QA Device has no au- thenticatedpermission, or no authenticated permission with the KeyRef. TypeMismatchType of the data from which the amount is being transferred in theUpgrading QA Device, doesn't match the Type of data to which the amountin being transferred in the Device being upgraded. UpgradeFieldEInvalidOnly applicable for transferring count-remaining values. The upgradefield associated with the count- remaining field in the QA Device beingupgraded is invalid. UpgradeFieldLInvalid Only applicable fortransferring count-remaining values. The upgrade field associated withthe count- remaining field in the Upgrading QA Device is invalid.UpgradeFieldMismatch Only applicable for transferring count-remainingvalues. Type of the data in the upgrade field in the Upgrading QADevice, doesn't match the Type of data in the upgrade field in theDevice being upgraded. FieldNumESizeInsufficient FieldNum to which theamount is being transferred, or which is being upgraded in the QA Deviceis not big enough to store the trans- ferred data.FieldNumLAmountInsufficient FieldNum in the Upgrading QA Device fromwhich the amount is being transferred doesn't have the amount requiredfor the transfer.

[7287] 27.1.3.1 SIG_(out)

[7288] Refer to Section 20.2.1 for details.

[7289] 27.1.4 Function Sequence

[7290] The XferAmount command is illustrated by the followingpseudocode: Accept input parameters-KeyRef, M0OfExternal, M1OfExternal,ChipId, FieldNumL, FieldNumE, XferValLength # Accept XferVal words For i

0 to XferValLength Accept next XferVal EndFor Accept R_(E), SIG_(E),R_(E2) #Generate message for passing into ValidateKeyRefAndSignaturefunction data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect|M0|M1) # Refer to Figure382. ---------------------------------------------------------------- #Validate KeyRef, and then verify signature ResultFlag =ValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L)) If (ResultFlag ≠Pass) Output ResultFlag Return EndIf----------------------------------------------------------------#Validate FieldNumE # FieldNumE is present in the device being upgradedPresentFlagFieldNumE

GetFieldPresent(M1OfExternal, FieldNumE) # Check FieldNumE present flagIf(PresentFlagFieldNumE ≠ 1) ResultFlag

FieldNumEInvalid Output ResultFlag Return EndIf--------------------------------------------------------------------------------- # Check Seq Fields Exist and get their Field Num # GetSeqdata field SEQ_1 num for the device being upgraded XferSEQ_1FieldNum

GetFieldNum(M1OfExternal, SEQ_1) # Check if the Seqdata field SEQ_1 isvalid If(XferSEQ_1FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf # Get Seqdata field SEQ_2num for the device being upgraded XferSEQ_2FieldNum

GetFieldNum(M1OfExternal, SEQ_2) # Check if the Seqdata field SEQ_2 isvalid If(XferSEQ_2FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- #Checkwrite permission for FieldNumE PermOKFieldNumE

CheckFieldNumEPerm(M1OfExternal, FieldNumE) If(PermOKFieldNumE ≠1)ResultFlag

FieldNumEWritePermInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- #Checkthat both SeqData fields have Decrement-Only permission with the samekey #that has write permission on FieldNumE PermOKXferSeqData

CheckSeqDataFieldPerms(M1OfExternal, XferSEQ_1FieldNum,XferSEQ_2FieldNum, FieldNumE) If(PermOKXferSeqData ≠1) ResultFlag

SeqWritePermInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- # GetSegData SEQ_1 data from device being upgradedGetFieldDataWords(XferSEQ_1FieldNum,XferSEQ_1DataFromDevice,M0OfExternal, M1OfExternal) # Get SeqData SEQ_2data from device being upgraded GetFieldDataWords(XferSEQ_2FieldNum,XferSEQ_2DataFromDevice, M0OfExternal,M1OfExternal)---------------------------------------------------------------- #FieldNumL is a present in the refill device PresentFlagFieldNumL

GetFieldPresent(M1,FieldNumL) If(PresentFlagFieldNumL ≠ 1) ResultFlag

FieldNumLInvalid Output ResultFlag Return EndIf #Check permission forFieldNumL PermOKFieldNumL

CheckFieldNumLPerm(M1, FieldNumL,KeyRef) If(PermOKFieldNumL ≠ 1)ResultFlag

FieldNumLWritePermInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- #Findthe type attribute for FieldNumE TypeFieldNumE

FindFieldNumType(M1OfExternal,FieldNumE) #Find the type attribute forFieldNumL TypeFieldNumE

FindFieldNumType(M1,FieldNumL) # Check type attribute for both fieldsmatch If(TypeFieldNumE ≠TypeFieldNumL) ResultFlag

TypeMismatch Output ResultFlag Return EndIf

[7291] Do this if the Refill Device is tranferring Count-remaining forPrinter upgrades # If the Type is count remaining, check that upgradevalues associated with # the count remaining are valid. Refer to Section28. for further details on # count remaining and upgrade value.If(TypeFieldNumL = TYPE_COUNT_REMAINING)

(TypeFieldNumE =TYPE_COUNT_REMAINING) #Upgrade value field is loweradjoining field UpgradeValueFieldNumE = FieldNumE −1If(UpgradeValueFieldNumE < 0) # upgrade field doesn't exist for QADevice being upgraded ResultFlag

UpgradeFieldEInvalid Output ResultFlag Return EndIfUpgradeValueFieldNumL = FieldNumL − 1 If(UpgradeValueFieldNumL < 0) #upgrade field doesn't existfor local device ResultFlag

UpgradeFieldLInvalid Output ResultFlag Return EndIf UpgradeValueCheckOK

UpgradeValCheck(UpgradeValueFieldNumL,M0,M1,UpgradeValueFieldNumL,M0OfExternal,M1OfExternal,KeyRef)If(UpgradeValueCheckOK = 0) ResultFlag

UpgradeFieldMismatch Output ResultFlag Return EndIf EndIf # Do this ifField Type is Count Remaining........end---------------------------------------------------------------- #Checkwhether the device being upgraded can hold the transfer amount#(XferVal + AmountLeft OverFlow

CanHold(FieldNumE,M0OfExternal,XferVal) If OverFlow error ResultFlag

FieldNumESizeInsufficient Output ResultFlag Return EndIf---------------------------------------------------------------- #Checkthe refill device has the desired amount (XferVal < = AmountLeft)UnderFlow

HasAmount(FieldNumL,M0,XferVal) If UnderFlow error ResultFlag

FieldNumLAmountInsufficient Output ResultFlag Return EndIf---------------------------------------------------------------- # Allchecks complete ..... # Generate Seqdata for SEQ_1 and SEQ_2 fieldsXferSEQ_1DataToDevice = XferSEQ_1DataFromDevice − 2XferSEQ_2DataToDevice = XferSEQ_2DataFromDevice − 1 # Add DataSet toXfer Entry Cache AddDataSetToXferEntryCache(ChipId,FieldNumE, FieldNumL,XferLength, XferVal, XferSEQ_1DataFromDevice, XferSEQ_2DataFromDevice) #Get current FieldDataE field data words to write to Xfer Entry cacheGetFieldDataWords(FieldNumE,FieldDataE M0OfExternal, M1OfExternal)#Deduct XferVal from FieldNumL and Write new valueDeductAndWriteValToFieldNumL(XferVal,FieldNumL,M0) #Generate new fielddata words for FieldNumE. The current FieldDataE is added to # XferValto generate new FieldDataEGenerateNewFieldData(FieldNumE,XferVal,FieldDataE) # GenerateFieldSelect and FieldVal for SeqData field SEQ_1, SEQ_2 and #FieldDataE... CurrentFieldSelect

0 FieldVal

0 GenerateFieldSelectAndFieldVal(FieldNumE, FieldDataE,XferSEQ_1FieldNum, XferSEQ_1DataToDevice,XferSEQ_2FieldNum,XferSEQ_2DataToDevice, FieldSelect,FieldVal) #Generate message forpassing into GenerateSignature function data

(RWSense|FieldSelect|ChipId|FieldVal)# Refer to Figure 373. #Createoutput signature for FieldNumE SIG_(out)

GenerateSignature(KeyRef,data,R_(L2),R_(E2)) Update R_(L2)to R_(L3)ResultFlag

Pass Output ResultFlag, FieldData, R_(L2),SIG_(out) Return EndIf

[7292] 27.1.4.1 ResultFlagValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L))

[7293] This function checks KeyRef is valID, and if KeyRef is valID,then input signature is verified using KeyRef. CheckRange(KeyRef.keyNum)If invalid ResultFlag

InValidKey Output ResultFlag Return EndIf #Generate message for passinginto GenerateSignature function data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect|M0|M1) # Refer to Figure382. #Generate Signature SIG_(L)

GenerateSignature(KeyRef,data,R_(E),R_(L)) # Check input signatureSIG_(E) If(SIG_(L) = SIG_(E)) Update R_(L) to R_(L2) Else ResultFlag

Bad Signature Output ResultFlag Return EndIf

[7294] 27.1.4.2 GenerateFieldSelectAndFieldVal (FieldNumE, FieldDataE,XferSEQ_(—)1FieldNum, XferSEQ_(—)1DataToDevice, XferSEQ_(—)2FieldNum,XferSEQ_(—)2DataToDevice, FieldSelect, FieldVal)

[7295] This functions generates the FieldSelect and FieldVal for outputfrom FieldNumE and its final data, and data to be written to Seq fieldsSEQ_(—)1 and SEQ_(—)2.

[7296] 27.1.4.3 PresentFlag GetFieldPresent(M1,FieldNum)

[7297] This function checks whether FieldNum is a valid. FieldSize[16]

0 # Array to hold FieldSize assuming there are 16 fields NumFields

FindNumberOfFieldsInM0 (M1,FieldSize) #Refer to Section 19.4.1If(FieldNum< NumFields) PresentFlag

1 Else PresentFlag

0 EndIf Return PresentFlag

[7298] 27.1.4.4 NumFields FindNumOfFieldsInM0(M1,FieldSize

)

[7299] Refer to Figure 19.4.1 for details.

[7300] 27.1.4.5 FieldNum GetFieldNum(M1, Type)

[7301] This function returns the field number based on the Type.FieldSize[16]

# Array to hold FieldSize assuming there are 16 fields NumFields

FindNumberOfFieldsInM0 (M1,FieldSize) #Refer to Section 19.4.1 For i = 0to NumFields If(M1 [i] .Type = Type) Return i # This is field Num formatching field EndFor i = 255 # If XferSession field was not found thenreturn an invalid value Return i

[7302] 27.1.4.6 PermOK CheckFieldNumEPerm(M1,FieldNumE)

[7303] This function checks authenticated write permission for FieldNumwhich holds the upgraded value. AuthRW

M1 [FieldNum] .AuthRW NonAuthRW

M1 [FieldNum] .NonAuthRW If(AuthRW = 1)

(NonAuthRW = 0) PermOK

1 Else PermOK

0 EndIf Return PermOK

[7304] 27.1.4.7 PermOK CheckSeqDataFieldPerms(M1, XferSEQ_(—)1FieldNum,XferSEQ_(—)2FieldNum, FieldNumE)

[7305] This function checks that both SeqData fields have Decrement-Onlypermission with the same key that has write permission on FieldNumE.KeyNumForFieldNumE

M1 [FieldNumE] .KeyNum # Isolate KeyNum for the field that will # beupgraded # Isolate KeyNum for both SeqData fields and check that theycan be written using the same key KeyNumForSEQ_1

M1 [XferSEQ_1Field.Num] .KeyNum KeyNumForSEQ_2

M1 [XferSEQ_2Field.Num] .KeyNum If(KeyNumForSEQ_1 ≠ KeyNumForSEQ_2)PermOK

0 Return PermOK EndIf # Check that the write key for FieldNumE andSeqData field is not the same If (KeyNumForSEQ_1 = KeyNumForFieldNumE)PermOK

0 Return PermOK EndIf #Isolate Decrement-Only permissions with the writekey of FieldNumE KeyPermsSEQ_1

M1 [XferSEQ_1FieldNum] .KeyPerms [KeyNumForFieldNumE] KeyPermsSEQ_2

M1 [XferSEQ_2FieldNum] .KeyPerms [KeyNumForFieldNumE] # Check that bothsequence fields have Decrement-Only permission for this keyIf(KeyPermsSEQ_1 = 0)

(KeyPermsSEQ_2 = 0) PermOK

0 Return PermOK EndIf PermOK

1 Return PermOK

[7306] 27.1.4.8 AddDataSetToXferEntryCache (ChipID, FieldNumE,FieldNumL, XferVa/, SEQ_(—)1Data, SEQ_(—)2Data)

[7307] This function adds a new dataset to the Xfer Entry cache. Datasetis a single record in the Xfer Entrycache. Refer to Section 27 fordetails. # Search for matching ChipId FieldNumE is Cache DataSet

SearchDataSetInCache (ChipId, FieldNumE) # If found If(DataSet is valid)DeleteDataSetInCache(DataSet) # This creates a vacant datasetAddRecordToCache(ChipId, FieldNumE,FieldDataL,XferVal,SEQ_1Data,SEQ_2Data) EndIf # Searches the cache for XferState complete/deletedFound

SearchRecordsInCache(complete/deleted) If(Found =1)AddRecordToCache(ChipId, FieldNumE,FieldDataL,XferVal,SEQ_1Data,SEQ_2Data) Else # This will overwrite the oldest DataSet in cacheAddRecordToCache(ChipId, FieldNumE,FieldDataL,XferVal,SEQ_1Data,SEQ_2Data) Return EndIf Set XferState in record to Xfer Return

[7308] 27.1.4.9 FieldType FindFieldNumType(M1,FieldNum)

[7309] This function gets the Type attribute for a given field.

[7310] FieldType←M1[FieldNum].Type

[7311] Return FieldType

[7312] 27.1.4.10 PermOK CheckFieldNumLPerm(M1, FieldNumL, KeyRef)

[7313] This function checks authenticated write permissions using KeyReffor FieldNumL in the refill device. AuthRW

_(M1) [FieldNumL] .AuthRW KeyNumAtt

_(M1) [FieldNumL] .KeyNum DOForKeys

_(M1) [FieldNumL] .DOForKeys [KeyNum] # Authenticated write allowed #ReadWrite key for field is the same as Input KeyRef.keyNum # Key hasboth ReadWrite and DecrementOnly Permission If(AuthRW = 1)

(KeyRef.keyNum = KeyNumAtt)

(DOForKeys = 1 PermOK

1 Else PermOK

0 EndIf Return PermOK

[7314] 27.1.4.11 CheckOK UpgradeValCheck(FieldNum1, M0OfFieldNum1,M1OfFieldNum1, FieldNum2, M0OfFieldNum2, M1OfFieldNum2,KeyRef)

[7315] This function checks the upgrade value corresponding to the countremaining. The upgrade value corresponding to the count remaining fieldis stored in the lower adjoining field. To upgrade the count remainingfield, the upgrade value in refill device and the device being upgradedmust match. #Check authenticated write permissions is allowed to thefield #Check that only one key has ReadWrite access, #and all other keysare ReadOnly access PermCheckOKFieldNum1

CheckUpgradeKeyForField(FieldNum1,M1OfFieldNum1,KeyRef)If(PermCheckOKFieldNum1 ≠ 1) CheckOK

0 Return CheckOK EndIf PermCheckOKFieldNum2

CheckUpgradeKeyForField(FieldNum2,M1OfFieldNum2,KeyRef)If(PermCheckOKFieldNum2 ≠ 1) CheckOK

0 Return CheckOK EndIf #Get the upgrade value associated with fieldGetFieldDataWords(FieldNum1,UpgradeValueFieldNum1,M0OfFieldNum1,M1OfFieldNum1) #Get the upgrade value associated with fieldGetFieldDataWords(FieldNum2,UpgradeValueFieldNum2,M0OfFieldNum2,M1OfFieldNum2) If(UpgradeValueFieldNum1 ≠ UpgradeValueFieldNum2) CheckOK

0 Return CheckOK EndIf # Get the type attribute for the fieldUpgradeTypeFieldNum1

GetUpgradeType(FieldNum1,M1OfFieldNum1) UpgradeTypeFieldNum2

GetUpgradeType(FieldNum2,M2OfFieldNum2) If(UpgradeTypeFieldNum1 ≠UpgradeTypeFieldNum2) CheckOK

0 Return CheckOK EndIf CheckOK

1 Return CheckOK

[7316] 27.1.4.12 CheckOK CheckUpgradeKeyForField(FieldNum,M1,KeyRef)

[7317] This function checks that authenticated write permissions isallowed to the field. It also checks that only one key has ReadWriteaccess and all other keys have ReadOnly access. KeyRef which updatescount remaining must not have write access to the upgarde value field.KeyNum

M1 [FieldNum] .KeyNum AuthRW

M1 [FieldNum] .AuthRW NonAuthRW

M1 [FieldNum] .NonAuthRW DOForKeys

M1 [FieldNum] .DOForKeys #Check that KeyRef doesn't have writepermissions to the field If(KeyRef.keyNum = KeyNum) CheckOK

0 Return CheckOK EndIf #AuthRW access allowed or NonAuthRW not allowedIf(AuthRW = 0)

(NonAuthRW =1) CheckOK

0 Return CheckOK EndIf For i

0 to 7 # Keys other than KeyNum are allowed ReadOnly access, #DecrementOnly access not allowed for other keys(not KeyNum) If (i≠KeyNum)

(DOForKeys[i] = 1) CheckOK

0 Return CheckOK EndIf #ReadWrite access allowed for KeyNum, #ReadWriteand DecrementOnly access not allowed for KeyNum. If (i = KeyNum)

(DOForKeys[i] = 1) CheckOK

0 Return CheckOK EndIf EndFor CheckOK

1 Return CheckOK

[7318] 27.1.4.13 Upgrade Type GetUpgrade Type(FieldNum, M1)

[7319] This function gets the type attribute for the upgrade field.

[7320] UpgradeType GetUpgradeType(FieldNum)

[7321] UpgradeType←M1[FieldNum].Type

[7322] Return UpgradeType

[7323] 27.1.4.14 GetFieldDataWords(FieldNum,FieldData

, M0, M1)

[7324] This function gets the words corresponding to a given field.CurrPos

MaxWordInM If FieldNum = 0 CurrPos

MaxWordInM Else CurrPos

(M1 [FieldNum −1] .EndPos) −1 # Next lower word after last word of the #previous field EndIf EndPos

(M1 [FieldNum] .EndPos) For i

EndPos to CurrPos j

0 FieldData[j]

M0 [i] #Copy M0 word to FieldData array EndFor

[7325] 27.2 StartRollBack

[7326] Input: KeyRef, _(M0)OfExternal, _(M1)OfExternal, ChipId,FieldNumL, FieldNumE, InputParameterCheck (optional), R_(E), SIG_(E),R_(E2)

[7327] Output: ResultFlag, FieldSelect, FieldVal, R_(L2), SIG_(out)

[7328] Changes: _(M0) and R_(L)

[7329] Availability Ink refill QA Device and Parameter Upgrader QADevice

[7330] 27.2.1 Function Description

[7331] StartRollBack function is used to start a rollback sequence ifthe QA Device being upgraded didn't receive the transfer messagecorrectly and hence didn't receive the transfer.

[7332] The system calls the function on the upgrading QA Device, passingin FieldNumE and ChipId of the QA Device being upgraded, and FieldNumLof the upgrading QA Device. The upgrading QA Device checks that the QADevice being upgraded didn't actually receive the message correctly, bycomparing the values read from the device with the values stored in theXfer Entry cache. The values compared is the value of the sequencefields. After all checks are fulfilled, the upgrading QA Device producesthe new data for the sequence fields and a signature. This issubsequently applied to the QA Device being upgraded (using theWriteFieldAuth function), which updates the sequence fields SEQ_(—)1 andSEQ_(—)2 to the pre-rollback values. However, the new data for thesequence fields and signature can only be applied if the previous datafor the sequence fields produced by Xfer function has not been written.

[7333] The output from the StartRollBack function consists only of thefield data of the two sequence fields, and a signature using the refillkey. When a pre-rollback output is produced, then sequence field data inSEQ_(—)1 (as stored in the Xfer Entry cache, which is what is passed into the XferAmount function) is decremented by 1 and the sequence fielddata in SEQ_(—)2 (as stored in the Xfer Entry cache, which is what ispassed in to the XferAmount function) is decremented by 2.

[7334] Additional InputParameterCheck value must be provided for theparameters not included in the SIG_(E), if the transmission between theSystem and Ink Refill QA Device is error prone, and these errors are notcorrected by the transimission protocol itself. InputParameterCheck isSHA-1[FieldNumL|FieldNumE], and is required to ensure the integrity ofthese parameters, when these inputs are received by the Ink Refill QADevice.

[7335] The StartRollBack function must first calculate theSHA-1[FieldNumL|FieldNumE], compare the calculated value to the valuereceived (InputParameterCheck) and only if the values match act upon theinputs.

[7336] 27.2.2 Input Parameters

[7337] Table 293 describes each of the input parameters for StarRollbackfunction. Parameter Description KeyRef For common key input signature:KeyRef.keyNum = Slot number of the key to be used for testing inputsignature. SIG_(E) produced using K_(KeyRef.keyNum) by the QA Devicebeing upgraded. KeyRef.useChipId = 0 For variant key input signature:KeyRef.keyNum = Slot number of the key to be used for generating thevariant key for testing input signature. SIG_(E) produced using avariant of K_(KeyRef.keyNum) by the QA Device being upgraded.KeyRef.useChipId = 1 KeyRef.chipId = ChipId of the device whichgenerated SIG_(E). _(M0)OfExternal All 16 words of _(M0) of the QADevice being upgraded which failed to upgrade. _(M1)OfExternal All 16words of _(M1) of the QA Device being upgraded which failed to upgrade.ChipId ChipId of the QA Device being upgraded which failed to upgrade.FieldNumL _(M0) field number of the local (refill) device from which thevalue was supposed to transferred. FieldNumE _(M0) field number of theQA Device being upgraded to which the value couldn't be transferred.R_(E) External random value used to verify input signature. This will bethe R from the input signature genera- tor (i.e device generatingSIG_(E)). The input signal generator in this case, is the device whichfailed to upgrade or a translation device. SIG_(E) External signaturerequired for authenticating input data. The input data in this case, isthe output from the Read function performed on the device which failedto upgrade. A correct SIG_(E) = SIG_(KeyRef) (Data | R_(E) | R_(L)).

[7338] 27.2.2.1 Input Signature Verification Data Format

[7339] Refer to Section 27.1.2.1.

[7340] 27.2.3 Output Parameters

[7341] Table 294 describes each of the output parameters forStartRollback function. Parameter Description ResultFlag Indicateswhether the function completed success- fully or not. If it did notcomplete successfully, the reason for the failure is returned here. SeeSection 12.1, Table 292 and Table 295. FieldSelect Selection of fieldsto be written In this case the bits corresponding to SEQ_1 and SEQ_2 areset to 1. All other bits are set to 0. FieldVal Updated data forsequence datat field for QA Device being upgraded. This must be passedas input to the WriteFieldsAuth function of the QA Device beingupgraded. R_(L2) Internal random value required to generate outputsignature. This must be passed as input to the WriteFieldsAuth functionor Translate function of the QA Device being upgraded. SIG_(out) Outputsignature which must be passed as an input to the WriteFieldsAuthfunction of the QA Device being upgraded. SIG_(out) = SIG_(KeyRef)(data| R_(L2) | R_(E2)) as per FIG. 373.

[7342] TABLE 295 Result definition for StartRollBack ResultFlagDefinition Description RollBackInvalid RollBack cannot be performed onthe request because parameters for rollback is incorrect.

[7343] 27.2.3.1 SIG_(out)

[7344] Refer to Section 20.2.1 for details.

[7345] 27.2.4 Function Sequence

[7346] The StartRollBack command is illustrated by the followingpseudocode:

[7347] Accept input parameters—KeyRef, M0OfExternal, M1OfExternal,ChipId, FieldNumL, FieldNumE, R_(E), SIG_(E), R_(E2) Accept R_(E),SIG_(E), R_(E2) #Generate message for passing intoValidateKeyRefAndSignature function data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect|M0|M1) # Refer to FIG.382. ---------------------------------------------------------------- #Validate KeyRef, and then verify signature ResultFlag =ValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L)) If (ResultFlag ≠Pass) Output ResultFlag Return EndIf----------------------------------------------------------------# CheckSeq Fields Exist and get their Field Num # Get Seqdata field SEQ_1 numfor the device being upgraded XferSEQ_1FieldNum

GetFieldNum(M1OfExternal, SEQ_1) # Check if the Seqdata field SEQ_1 isvalid If(XferSEQ_1FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf # Get Seqdata field SEQ_2num for the device being upgraded XferSEQ_2FieldNum

GetFieldNum(M1OfExternal, SEQ_2) # Check if the Seqdata field SEQ_2 isvalid If(XferSEQ_2FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- # GetSeqData SEQ_1 data from device being upgradedGetFieldDataWords(XferSEQ_1FieldNum,XferSEQ_1DataFromDevice,M0OfExternal,M1OfExternal) # Get SeqData SEQ_2data from device being upgraded GetFieldDataWords(XferSEQ_2FieldNum,XferSEQ_2DataFromDevice, M0OfExternal,M1OfExternal)---------------------------------------------------------------- # CheckXfer Entry in cache is correct - dataset exists, Field data # andsequence field data matches and Xfer State is correct XferEntryOK

CheckEntry(ChipId, FieldNumE, FieldNumL, XferSEQ_1DataFromDevice,XferSEQ_2DataFromDevice) If( XferEntryOK= 0) ResultFlag

RollBackInvalid Output ResultFlag Return EndIf # Generate Seqdata forSEQ_1 and SEQ_2 fields XferSEQ_1DataToDevice = XferSEQ_1DataFromDevice -1 XferSEQ_2DataToDevice = XferSEQ_2DataFromDevice - 2 # GenerateFieldSelect and FieldVal for sequence fields SEQ_1 and SEQ_2CurrentFieldSelect

0 FieldVal

0 GenerateFieldSelectAndFieldVal(XferSEQ_1FieldNum,XferSEQ_1DataToDevice, XferSEQ_2FieldNum, XferSEQ_2DataToDevice,FieldSelect, FieldVal) #Generate message for passing intoGenerateSignature function data

(RWSense|FieldSelect|ChipId|FieldVal)# Refer to FIG. 373. #Create outputsignature for FieldNumE SIG_(out)

GenerateSignature (KeyRef,data,R_(L2),R_(E2)) Update R_(L2) to R_(L3)ResultFlag

Pass Output ResultFlag, FieldData, R_(L2),SIG_(out) Return EndIf

[7348] 27.3 RollBackAmount

[7349] Input: KeyRef, _(M0)OfExternal, _(M1)OfExternal, ChipId,FieldNumL, FieldNumE, InputParameterCheck (optional), R_(E), SIG_(E)

[7350] Output: ResultFlag

[7351] Changes: _(M0) and R_(L)

[7352] Availablity: Ink refill QA Device

[7353] 27.3.1 Function Description

[7354] RollBackAmount function finally adjusts the value of theFieldNumL of the upgarding QA Device to a previous value before thetransfer request, if the QA Device being upgraded didn't receive thetransfer message correctly (and hence was not upgraded).

[7355] The upgrading QA Device checks that the QA Device being upgradeddidn't actually receive the transfer message correctly, by comparing thesequence data field values read from the device with the values storedin the Xfer Entry cache. The sequence data field values read must matchwhat was previously written using the StartRollBack function. After allchecks are fulfilled, the upgrading QA Device adjusts its FieldNumL.

[7356] Additional InputParameterCheck value must be provided for theparameters not included in the SIG_(E), if the transmission between theSystem and Ink Refill QA Device is error prone, and these errors are notcorrected by the transimission protocol itself. InputParameterCheck isSHA-1[FieldNumL|FieldNumE], and is required to ensure the integrity ofthese parameters, when these inputs are received by the Ink Refill QADevice.

[7357] The RollBackAmount function must first calculate theSHA-1[FieldNumL|FieldNumE], compare the calculated value to the valuereceived (InputParameterCheck) and only if the values match act upon theinputs.

[7358] 27.3.2 Input Parameters

[7359] Table 296 describes each of the input parameters forRollbackAmount function. Parameter Description KeyRef For common keyinput signature: KeyRef.keyNum = Slot number of the key to be used fortesting input signature. SIG_(E) produced using K_(KeyRef.keyNum) by theQA Device being upgraded. KeyRef.useChipId = 0 For variant key inputsignature: KeyRef.keyNum = Slot number of the key to be used forgenerating the variant key for testing input signature. SIG_(E) producedusing a variant of K_(KeyRef.keyNum) by the QA Device being upgraded.KeyRef.useChipId = 1 KeyRef.chipId = ChipId of the device whichgenerated SIG_(E). _(M0)OfExternal All 16 words of _(M0) of the QADevice being upgraded which failed to upgrade. _(M1)OfExternal All 16words of _(M1) of the QA Device being upgraded which failed to upgrade.ChipId ChipId of the QA Device being upgraded which failed to upgrade.FieldNumL _(M0) field number of the local (refill) device from which thevalue was supposed to transferred. FieldNumE _(M0) field number of theQA Device being upgraded to which the value was not transferred. R_(E)External random value used to verify input signature. This will be the Rfrom the input signature generator (i.e device generating SIG_(E)). Theinput signal generator in this case, is the device which failed toupgrade or a translation device. SIG_(E) External signature required forauthentic- ating input data. The input data in this case, is the outputfrom the Read function performed on the device which failed to upgrade.A correct SIG_(E) = SIG_(KeyRef)(Data | R_(E) | R_(L)).

[7360] 27.3.2.1 Input Signature Generation Data Format

[7361] Refer to Section 27.1.2.1 for details.

[7362] 27.3.3 Output Parameters

[7363] Table 297 describes each of the output parameters forRollbackAmount. Parameter Description ResultFlag Indicates whether thefunction completed successfully or not. If it did not completesuccessfully, the reason for the failure is returned here. See Section12.1, Table 292 and Table 295.

[7364] 27.3.4 Function Sequence

[7365] The RollBackAmount command is illustrated by the followingpseudocode:

[7366] Accept input parameters—KeyRef, M0OfExternal, M1OfExternal,ChipId, FieldNumL, FieldNumE, R_(E), SIG_(E) Accept inputparameters-KeyRef, M0OfExternal, M1OfExternal, ChipId, FieldNumL,FieldNumE, R_(E),SIG_(E) #Generate message for passing intoValidateKeyRefAndSignature function data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect|M0|M1) # Refer to Figure382. ---------------------------------------------------------------- #Validate KeyRef, and then verify signature ResultFlag =ValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L)) If (ResultFlag ≠Pass) Output ResultFlag Return EndIf---------------------------------------------------------------- # CheckSeq Fields Exist and get their Field Num # Get Seqdata field SEQ_1 numfor the device being upgraded XferSEQ_1FieldNum

GetFieldNum(M1OfExternal, SEQ_1) # Check if the Seqdata field SEQ_1 isvalid If(XferSEQ_1FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf # Get Seqdata field SEQ_2num for the device being upgraded XferSEQ_2FieldNum

GetFieldNum(M1OfExternal, SEQ_2) # Check if the Seqdata field SEQ_2 isvalid If(XferSEQ_2FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- # GetSeqData SEQ_1 data from device being upgradedGetFieldDataWords(XferSEQ_1FieldNum,XferSEQ_1DataFromDevice,M0OfExternal,M1OfExternal) # Get SeqData SEQ_2data from device being upgraded GetFieldDataWords(XferSEQ_2FieldNum,XferSEQ_2DataFromDevice, M0OfExternal,M1OfExternal)---------------------------------------------------------------- #Generate Seqdata for SEQ_1 and SEQ_2 fields with the data that is readXferSEQ_1Data = XferSEQ_1DataFromDevice + 1 XferSEQ_2Data =XferSEQ_2DataFromDevice + 2 # Check Xfer Entry in cache is correct -dataset exists, Field data # and sequence field data matches and XferState is correct XferEntryOK

CheckEntry(ChipId, FieldNumE, FieldNumL, XferSEQ_1Data, XferSEQ_2Data)If( XferEntryOK= 0) ResultFlag

RollBackInvalid Output ResultFlag Return EndIf # Get ΔFieldDataL fromDataSet GetVal(ChipId, FieldNumE, ΔFieldDataL) # Add ΔFieldDataL toFieldNumL AddValToField(FieldNumL, ΔFieldDataL) # Update XferState inDataSet to complete/deleted UpdateXferStateToComplete(ChipId, FieldNumE)ResultFlag

Pass Output ResultFlag Return

[7367] Functions

[7368] Upgrade Device

[7369] (Printer Upgrade)

[7370] 28 Concepts

[7371] This section is very similar to Section 26. The differencesbetween this section and Section 26 have been summarised and underlined,where required.

[7372] 28.1 Purpose

[7373] In a printing application, a printer contains a Printer QADevice, which stores details of the various operating parameters of aprinter, some of which may be upgradeable. The upgradeable parametersmust be written (initially) and changed in an authorised manner.

[7374] The authorisation for the write or change is achieved by using aParameter Upgrader QA Device which contains the necessary functions toallow a write or a change of a parameter value (e.g. a print speed) intoanother QA Device, typically a printer QA Device. This QA Device is alsoreferred to as an upgrading QA Device.

[7375] A parameter upgrader QA Device is able to perform a fixed numberof upgrades, and this number is effectively a consumable value. Thenumber of upgrades remaining is also referred to as count-remaining.With each write/change of an operating parameter in a Printer QA Device,the count-remaining decreases by 1, and can be replenished by a valueupgrader QA Device.

[7376] The Parameter Upgrader QA Device can also be referred to as theUpgrading QA Device, and the Printer QA Device can also be referred toas the QA Device being upgraded.

[7377] The writing or changing of the parameter can also be referred toas a transfer of a parameter.

[7378] The Parameter Upgrader QA Device copies its parameter value fieldto the parameter value field of Printer QA Device, and decrements thecount-remaining field associated with the parameter value field by 1.

[7379] 28.2 Requirements

[7380] The transfer of a parameter has two basic requirements:

[7381] The transfer can only be performed if the transfer request isvalid. The validity of the transfer request must be completely checkedby the Parameter Upgrader QA Device, before it produces the requiredoutput for the transfer. It must not be possible to apply the transferoutput to the Printer QA Device, if the Parameter Upgrader QA Device hasbeen already been rolled back for that particular transfer.

[7382] A process of rollback is available if the transfer was notreceived by the Printer QA Device.

[7383] A rollback is performed only if the rollback request is valid.The validity of the rollback request must be completely checked by theParameter Upgrader QA Device, before the count-remaining value isincremented by 1. It must not be possible to rollback an ParameterUpgrader QA Device for a transfer, which has already been applied to thePrinter QA Device i.e the Parameter Upgrader QA Device must only berolled back for transfers that have actually failed.

[7384] 28.3 Basic Scheme

[7385] The transfer and rollback process is shown in FIG. 383.

[7386] Following is a sequential description of the transfer androllback process:

[7387] 1. The System Reads the memory vectors M0 and M1 of the PrinterQA Device. The output from the read which includes the M0 and M1 wordsof the Printer QA Device, and a signature, is passed as an input to theTransfer Request. It is essential that M0 and M1 are read together. Thisensures that the field information for M0 fields are correct, and havenot been modified, or substituted from another device. Entire M0 and M1must be read to verify the correctness of the subsequent TransferRequest by the Parameter Upgrader QA Device.

[7388] 2. The System makes a Transfer Request to the Parameter UpgraderQA Device with the field in the Parameter Upgrader QA Device whose datawill be copied to the Printer QA Device, and the field in Printer QADevice to which this data will be copied to. The Transfer Request alsoincludes the output from Read of the Printer QA Device. The ParameterUpgrader QA Device validates the Transfer Request based on the Readoutput, checks that it has enough count-remaining for a successfultransfer, and then produces the necessary Transfer output. The TransferOutput typically consists of new field data for the field being refilledor upgraded, additional field data required to ensure the correctness oftransfer/rollback, along with a signature.

[7389] 3. The System then applies the Transfer Output on the Printer QADevice, by calling an authenticated Write on it, passing in the TransferOutput. The Write is either successful or not. If the Write is notsuccessful, then the System will repeat calling the Write function usingthe same transfer output, which may be successful or not. Ifunsuccessful the System will initiate a rollback of the transfer. Therollback must be performed on the Parameter Upgrader QA Device, so thatit can adjust its value to a previous value before the current TransferRequest was initiated.

[7390] 4. The System starts a rollback by Reading the memory vectors M0and M1 of the Printer QA Device.

[7391] 5. The System makes a StartRollBack Request to the ParameterUpgrader QA Device with same input parameters as the Transfer Request,and the output from Read in (4). The Parameter Upgrader QA Devicevalidates the StartRollBack Request based on the Read output, and thenproduces the necessary Pre-rollback output. The Pre-rollback outputtypically consists only of additional field data along with a signature.

[7392] 6. The System then applies the Pre-rollback output on theParameter Upgrader QA Device, by calling an authenticated Write on it,passing in the Pre-rollback output. The Write is either successful ornot. If the Write is not successful, then either (6), or (5) and (6)must be repeated.

[7393] 7. The System then Reads the memory vectors M0 and M1 of thePrinter QA Device.

[7394] 8. The System makes a RollBack Request to the Parameter UpgraderQA Device with same input parameters as the Transfer Request, and theoutput from Read (7). The Parameter Upgrader QA Device validates theRollBack Request based on the Read output, and then rolls back itscount-remaining field by incrementing it by 1.

[7395] 28.3.1 Transfer

[7396] The Printer QA Device stores upgradeable operating parametervalues in M0 fields, and its corresponding M₁ words contains fieldinformation for its operating parameter fields. The field informationconsists of the size of the field, the Type of data stored in field andthe access permission to the field. See Section 8.1.1 for details.

[7397] The Parameter Upgrader QA Device also stores the new operatingparameter values (which will be written to the Printer QA Device) in itsM0 fields, and its coressponding M₁ words contains field information forthe new operating parameter fields. Additionally, the Parameter UpgraderQA Device has a count-remaining field associated with the new operatingparameter value field. The count-remaining field occupies the higherfield position when compared to its associated operating parameter valuefield.

[7398] 28.3.1.1 Authorisation

[7399] The basic authorisation for a transfer comes from a key, whichhas authenticated ReadWrite permission (stored in field information asKeyNum) to the operating parameter field in the Printer QA Device. Wewill refer to this key as the upgrade key. The same upgrade key mustalso have authenticated decrement-only permission to the count-remainingfield (which decrements by 1 with every transfer) in the ParameterUpgrader QA Device.

[7400] After validating the input upgrade request, the ParameterUpgrader QA Device will decrement the value of the count-remaining fieldby 1, and produce data (by copying the data stored from its operatingparameter field) and signature for the new operating parameter using theupgrade key. Note that the Parameter Upgrader QA Device can decrementits count-remaining field only if the upgrade key has the permission todecrement it.

[7401] The data and signature produced by the Parameter Upgrader QADevice is subsequently applied to the Printer QA Device. The Printer QADevice will accept the new transferred operating parameter, only if thesignature is valid. Note that the signature will only be valid if it wasproduced using the upgrade key which has write permission to theoperating parameter field being written.

[7402] The upgrade key has authenticated ReadWrite permission to theoperating parameter field (which will change) in the Printer QA Device.The upgrade key has decrement-only permission to the the count-remainingfield (which decrements by 1 with every transfer of field) in theParameter Upgrader QA Device.

[7403] 28.3.1.2 Data Type Matching

[7404] The Parameter Upgrader QA Device validates the transfer requestby matching the Type of the data in the field information of operatingparameter field (stored in M1) of Printer QA Device to the Type of datain the field information of operating parameter field of the ParameterUpgrader QA Device. This ensures that equivalent data types are beingtransferred i.e Network_OEM1_printspeed_(—)1500 is not transferred toNetwork_OEM1_printspeed_(—)2000.

[7405] 28.3.1.3 Addition Validation

[7406] Additional validation of the transfer request must be performedbefore a transfer output is generated by the Parameter Upgrader QADevice. These are as follows:

[7407] For the Printer QA Device

[7408] 1. Whether the field being upgraded is actually present.

[7409] 2. Whether the field being upgraded can hold the changed value.

[7410] For the Parameter Upgrader QA Device:

[7411] 1. Whether the new operating parameter field and its associatedcount-remaining is actually present.

[7412] 2. Whether the count-remaining field has an upgrade left for thetransfer to succeed.

[7413] 28.3.1.4 Rollback Facilitation

[7414] To facilitate a rollback, the Parameter Upgrade QA Device willstore a list of transfer requests processed by it. This list is referredto as the Xfer Entry cache. Each record in the list consists of thetransfer parameters corresponding to the transfer request.

[7415] 28.3.2 Rollback

[7416] A rollback request will be validated by looking through the XferEntry cache of the Parameter Upgrader QA Device. After the righttransfer request is found the Parameter Upgrade QA Device checks thatthe output from the transfer request was not applied to the Printer QADevice by comparing the current Read of the Printer QA Device to thevalues in the Xfer Entry cache, and finally rolling back the ParameterUpgrader QA Device count-remaining field by incrementing it by 1. TheParameter Upgrader QA Device must be absolutely sure that the Printer QADevice didn't receive the transfer. This factor determines theadditional fields that must be written along with new operatingparameter data, and also the parameters of the transfer request thatmust be stored in the Xfer Entry cache to facilitate a rollback, toprove that the Printer QA Device didn't actually receive the transfer.

[7417] The rollback process increments the count-remaining field by 1 inthe Parameter Upgrader QA Device.

[7418] 28.3.2.1 Sequence Fields

[7419] The rollback process must ensure that the transfer output (whichwas previously produced) for which the rollback is being performed,cannot be applied after the rollback has been performed. How do weachieve this? There are two separate decrement-only sequence fields(SEQ_(—)1 and SEQ_(—)2) in the Printer QA Device which can only bedecremented by the Parameter Upgrader QA Device using the upgrade key.The nature of data to be written to the sequence fields is such thateither the transfer output or the pre-rollback output can be applied tothe Printer QA Device, but not both i.e they must be mutually exclusive.Refer to Table 285 for details.

[7420] The two sequence fields are initialised to 1xFFFFFFFF usingsequence key. The sequence key is different to the upgrade key, and hasauthenticated ReadWrite permission to both the sequence fields.

[7421] The transfer output consists of the new data for the field beingupgraded, field data of the two sequence fields, and a signature usingthe upgrade key. The field data for SEQ_(—)1 is decremented by 2 fromthe original value that was passed in with the transfer request. Thefield data for SEQ_(—)2 is decremented by 1 from the original value thatwas passed in with the transfer request.

[7422] The pre-rollback output consists only of the field data for thetwo sequence fields, and a signature using the upgrade key. The fielddata for SEQ_(—)1 is decremented by 1 from the original value that waspassed in with the transfer request. The field data for SEQ_(—)2 isdecremented by 2 from the original value that was passed in with thetransfer request.

[7423] Since the two sequence fields are decrement-only fields, thewriting of the transfer output to QA Device being upgraded will preventthe writing of the pre-rollback output to QA Device being upgraded,since the sequence fields are decrement-only fields, and only onepossible set can be written. If the writing of the transfer outputfails, then pre-rollback can be written. However, the transfer outputcannot be written after the pre-rollback output has been written.

[7424] Before a rollback is performed, the Parameter Upgrader QA Devicemust confirm that the sequence fields was successfully written to thepre-rollback values in the Printer QA Device. Because the sequencefields are decrement-only fields, the Printer QA Device will allowpre-rollback output to be written only if the transfer output has notbeen written.

[7425] 28.3.2.1.1 Field Information of the Sequence Data Field

[7426] For a device to be upgradeable the device must have two sequencefields SEQ_(—)1 and SEQ_(—)2 which are written with sequence data duringthe transfer sequence. Thus all upgrading QA Devices, ink QA Devices andprinter QA Devices must have two sequence fields. The upgrading QADevices must have these fields because they can be upgraded as well. Thesequence field information are defined in Table 298. Attribute NameValue Explanation Type TYPE_SEQ_1 or See Appendix A for exactTYPE_SEQ_2. data. KeyNum Slot number of Only the sequence key thesequence has authenticated ReadWrite key. access to this field. Non Auth0 Non authenticated ReadWrite RW Perm^(b) is not allowed to the field.Auth RW 1 Authenticated (key based) Perm^(c) ReadWrite access is allowedto the field. KeyPerm KeyPerms[KeyNum] = KeyNum is the slot number 0 ofthe sequence key, which has ReadWrite permission to the field.KeyPerms[Slot Upgrade key can decrement number of upgrade the sequencefield. key] = 1 KeyPerms[others= All other keys have 0 . . . 7(exceptupgrade ReadOnly access. key)] = 0 End Pos Set as required. Size istypically 1 word.

[7427] 28.3.3 Upgrade States

[7428] There are three states in an transfer sequence, the first stateis initiated for every transfer, while the next two states are initiatedonly when the transfer fails. The states are—Xfer, StartRollback, andRollback.

[7429] 28.3.3.1 Upgrade Flow

[7430]FIG. 384 shows a typical upgrade flow.

[7431] 28.3.3.2 Xfer

[7432] This state indicates the start of the transfer process, and isthe only state required if the transfer is successful. During thisstate, the Parameter Upgrader QA Device adds a new record to its XferEntry cache, decrements its count-remaining by 1, produces new operatingparameter field, new sequence data (as described in Section 28.3.2.1)and a signature based on the upgrade key.

[7433] The Printer QA Device will subsequently write the new operatingparameter field and new sequence data, after verifying the signature. Ifthe new operating parameter field can be successfully written to thePrinter QA Device, then this will finish a successful transfer.

[7434] If the writing of the new amount is unsuccessful (result returnedis BAD SIG), the System will re-transmit the transfer output to thePrinter QA Device, by calling the authenticated Write function on itagain, using the same transfer output.

[7435] If retrying to write the same transfer output fails repeatedly,the System will start the rollback process on Parameter Upgrader QADevice, by calling the Read function on the Printer QA Device, andsubsequently calling the StartRollBack function on the ParameterUpgrader QA Device. After a successful rollback is performed, the Systemwill invoke the transfer sequence again.

[7436] 28.3.3.3 StartRollBack

[7437] This state indicates the start of the rollback process. Duringthis state, the Parameter Upgrade QA Device produces the next sequencedata and a signature based on the upgrade key. This is also called apre-rollback, as described in Section 26.3.2.

[7438] The pre-rollback output can only be written to the Printer QADevice, if the previous transfer output has not been written. Thewriting of the pre-rollback sequence data also ensures, that if theprevious transfer output was captured and not applied, then it cannot beapplied to the Printer QA Device in the future.

[7439] If the writing of the pre-rollback output is unsuccessful (resultreturned is BAD SIG), the System will re-transmit the pre-rollbackoutput to the Printer QA Device, by calling the authenticated Writefunction on it again, using the same pre-rollback output.

[7440] If retrying to write the same pre-rollback output failsrepeatedly, the System will call the StartRollback on the ParameterUpgrade QA Device again, and subsequently calling the authenticatedWrite function on the Printer QA Device using this output.

[7441] 28.3.3.4 Rollback

[7442] This state indicates a successful deletion (completion) of atransfer sequence. During this state, the Parameter Upgrader QA Deviceverifies the sequence data produced from StartRollBack has beencorrectly written to Printer QA Device, then rolls its count-remainingfield to a previous value before the transfer request was issued.

[7443] 28.3.4 Xfer Entry Cache

[7444] The Xfer Entry data structure must allow for the following:

[7445] Stores the transfer state and sequence data for a given transfersequence.

[7446] Store all data corresponding to a given transfer, to facilitate arollback to the previous value before the transfer output was generated.

[7447] The Xfer Entry cache depth will depend on the QA Chip LogicalInterface implementation. For some implementations a single Xfer Entryvalue will be saved. If the Parameter Upgrader QA Device has nopowersafe storage of Xfer Entry cache, a power down will cause theerasure of the Xfer Entry cache and the Parameter Upgrader QA Devicewill not be able to rollback to a pre-power-down value.

[7448] A dataset in the Xfer Entry cache will consist of the following:

[7449] Information about the Printer QA Device:

[7450] a. ChipId of the device.

[7451] b. FieldNum of the M0 field (i.e what was being upgraded).

[7452] Information about the Parameter Upgrader QA Device:

[7453] a. FieldNum of the M0 field used to transfer the count-remainingfrom.

[7454] Xfer State—indicating at which state the transfer sequence is.This will consist of:

[7455] a. State definition which could be one of the following:—Xfer,StartRollBack and deleted (completed).

[7456] b. The value of sequence data fields SEQ_(—)1 and SEQ_(—)2.

[7457] The Xfer Entry cache stores the FieldNum of the count-remainingfield of the Parameter Upgrader QA Device.

[7458] 28.3.4.1 Adding New Dataset

[7459] A new dataset is added to Xfer Entry cache by the Xfer function.

[7460] There are three methods which can be used to add new dataset tothe Xfer Entry cache. The methods have been listed below in the order oftheir priority:

[7461] 1. Replacing existing dataset in Xfer Entry cache with newdataset based on ChipId and FieldNum of the Ink QA Device in the newdataset. A matching ChipId and FieldNum could be found because aprevious transfer output corresponding to the dataset stored in the XferEntry cache has been correctly received and processed by the ParameterUpgrader QA Device, and a new transfer request for the same Printer QADevice, same field, has come through to the Parameter Upgrader QADevice.

[7462] 2. Replace existing dataset cache with new dataset based on theXfer State. If the Xfer State for a dataset indicates deleted(complete), then such a dataset will not be used for any furtherfunctions, and can be overwritten by a new dataset.

[7463] 3. Add new dataset to the end of the cache. This willautomatically delete the oldest dataset from the cache regardless of theXfer State.

[7464] 28.4 Upgrading the Count-Remaining Field

[7465] This section is only applicable to the Parameter Upgrader QADevice.

[7466] The transfer of count-remaining is similar to transferink-remaining because both involve transferring of amounts. Therefore,this transfer uses the XferAmount function.

[7467] The XferAmount function performs additional checks whentransferring count-remaining. This includes checking of the operatingparameter field, associated with the count-remaining. They are asfollows:

[7468] The operating parameter value of the upgrading QA Device and theQA Device being upgraded must match.

[7469] The operating parameter field (in both devices) must beupgradeable by one key only, and all other keys must have ReadOnlyaccess. This key which has authenticated ReadWrite permission to theoperating parameter field, must be different to the key that hasauthenticated Read Write permission to the count-remaining field.

[7470] The data Type for the operating parameter field in the upgradingQA Device must match the data Type for the operating parameter field inthe QA Device being upgraded.

[7471] 28.5 New Operating Parameter Field Information

[7472] This section is only applicable to the Parameter Upgrader QADevice.

[7473] This field stores the operating parameter value that is copiedfrom the Parameter Upgrader QA Device to the operating parameter fieldbeing updated in the Printer QA Device.

[7474] This field has a single key associated with it. This key hasauthenticated ReadWrite permission to this field and will be referred toas write-parameter key.

[7475] Table 299 shows the field information for the new operatingparameter field in the Parameter Upgrader QA Device. Attribute NameValue Explanation Type For e.g - Type describing the upgrade.TYPE_UPGRADE_PRINTSPEED_15^(a) KeyNum Slot number of the write- Only thewrite-parameter key has parameter key. authenticated ReadWrite access tothis field. Non Auth 0 Non authenticated ReadWrite RW Perm^(b) is notallowed to the field. Auth RW 1 Authenticated (key based) Perm^(c)ReadWrite access is allowed to the field. KeyPerm KeyPerms[KeyNum] =KeyNum is the slot number of 0 the write-parameter key which hasReadWrite permission to the field. KeyPerms[others= All other keys have0 . . . 7] = 0 ReadOnly access. End Pos Set as required.

[7476] 28.6 Different Types of Transfer

[7477] There can be three types of transfer:

[7478] Parameter Transfer—This is transfer of an operating parametervalue from a Parameter Upgrader QA Device to a Printer QA Device. Thisis performed when an upgradeable operating parameter is written (for thefirst time) or changed.

[7479] Hierarchical refill—This is a transfer of count-remaining valuefrom one Parameter Upgrader Refill QA Device to a Parameter Upgrader QADevice, where both QA Devices belong to the same OEM. This is typicallyperformed when OEM divides the number of upgrades from one of itsParameter Upgrader QA Device to many of its Parameter Upgrader QADevices.

[7480] Peer to Peer refill—This is a transfer of count-remaining valuefrom one Parameter Upgrader Refill QA Device to Parameter UpgraderRefill QA Device, where the QA Devices belong to differentorganisations, say ComCo and OEM. This is typically performed when ComCodivides number of upgrades from its Parameter Upgrader QA Device toseveral Parameter Upgrader QA Device belonging to several OEMs.

[7481] Transfer of count-remaining between peers, and hierarchicaltransfer of count-remaining, is similar to an ink transfer, butadditional checks on the transfer request is performed when transferringcount-remaining amounts. This is described in Section 28.4.1.

[7482] Transfer of an operating parameter value decrements thecount-remaining by 1, hence is different to a ink-transfer.

[7483]FIG. 385 is a representation of various authorised upgrade pathsin the printing system.

[7484] 28.6.1 Hierarchical Transfers

[7485] Referring to FIG. 385, this transfer is typically performed whencount-remaining amount is transferred from ComCo's Parameter UpgraderRefill QA Device to OEM's Parameter Upgrader Refill QA Device, or fromQACo's Parameter Upgrader Refill QA Device to ComCo's Parameter UpgraderRefill QA Device.

[7486] This transfers are made using the XferAmount function (and notwith the XferField described in Section 29.1). because count-remainingtransfer is similar to fill/refilling of ink amounts, where ink amountis replaced by count-remaining amount.

[7487] 28.6.1.1 Keys and Access Permission

[7488] We will explain this using a transfer from ComCo to OEM.

[7489] There is a count-remaining field associated with the ComCo'sParameter Upgrader Refill QA Device. This count-remaining field has twokeys associated with:

[7490] The first key transfers count-remaining to the device fromanother Parameter Upgrader Refill QA device(device is higher in theheirachy), fills/refills the device itself.

[7491] The second key transfers count-remaining from it to other devices(which are lower in the heirachy), fills/refills other devices from it.

[7492] There is a count-remaining field associated with the OEM'sParameter Upgrader Refill QA Device.

[7493] This count-remaining field has a single key associated with:

[7494] This key transfers count-remaining to the device from anotherParameter Upgrader Refill QA device (which is higher or at the samelevel in the heirachy), fills/refills (upgrades) the device itself, andadditionally transfers count-remaining from it to other devices (whichare lower in the heirachy), fills/refills (upgrades) other devices fromit.

[7495] For a successful transfer of count-remaining from ComCo's refilldevice to an OEM's refill device, the ComCo's refill device and theOEM's refill device must share a common key or a variant key.

[7496] This key is fill/refill key with respect to the OEM's refilldevice and it is the transfer key with respect to the ComCo's refilldevice.

[7497] For a ComCo to successfully fill/refill its refill device fromanother refill device (which is higher in the heirachy possiblybelonging to the QACo), the ComCo's refill device and the QACo's refilldevice must share a common key or a variant key. This key is fill/refillkey with respect to the ComCo's refill device and it is the transfer keywith respect to the QACo's refill device.

[7498] 28.6.1.1.1 Count-Remaining Field Information

[7499] Table 300 shows the field information for an _(M0) field storinglogical count-remaining amounts in the refill device, which has theability to transfer down the heirachy.

[7500] Table 300 shows the field information for an _(M0)field storinglogical count-remaining amounts in the refill device, which has theability to transfer down the heirachy. Attribute Name Value ExplanationType TYPE_COUNT_REMAINING^(a) Type describes that the field is a count-remaining field. KeyNum Slot number of the Only the refill key refillkey. has authenticated ReadWrite access to this field. Non Auth 0 Nonauthenticated RW Perm^(b) ReadWrite is not allowed to the field. Auth RW1 Authenticated (key Perm^(c) based) ReadWrite access is allowed to thefield. KeyPerm KeyPerms[KeyNum] = KeyNum is the slot 0 number of therefill key, which has ReadWrite permission to the field. KeyPerms[SlotNum of Transfer key can transfer key ] = 1 decrement the field.KeyPerms[others = All other keys have 0 . . . 7(except ReadOnly access.transfer key)] = 0 End Pos Set as required. Depends on the amount oflogical ink the device can store and storage resolution - i.e inpicolitres or in microlitres.

[7501] 28.6.2 Peer to Peer Transfer

[7502] Referring to FIG. 385, this transfer is typically performed whencount-remaining amount is transferred from OEM's Parameter UpgraderRefill QA Device to another Parameter Device Refill QA Device belongingto the same OEM.

[7503] 28.6.2.1 Keys and Access Permission

[7504] There is an count-remaining field associated with the refilldevice. This count-remaining field has a single key associated with:

[7505] This key transfers count-remaining amount to the device fromanother refill device (which is higher or at the same level in theheirachy), fills/refills (upgrades) the device itself, and additionallytransfers ink from it to other devices (which are lower in theheirachy), fills/refills (upgrades) other devices from it.

[7506] This key is referred to as the fill/refill key and is used forboth fill/refill and transfer. Hence, this key has both ReadWrite andDecrement-Only permission to the count-remaining field in the refilldevice.

[7507] 28.6.2.1.1 Count-Remaining Field Information

[7508] Table 301 showns the field information for an _(M0)field storinglogical count-remaining amounts in the refill device with the ability totransfer between peers. TABLE 301 Field information for ink-remainingfield for refill devices transferring between peers Attribute Name ValueExplanation Type TYPE_COUNT_REMAINING^(a) Type describes that the fieldis a count- remaining field. KeyNum Slot number of the Only the refillkey refill key. has authenticated ReadWrite access to this field. NonAuth 0 Non authenticated RW Perm^(b) ReadWrite is not allowed to thefield. Auth RW 1 Authenticated (key Perm^(c) based) ReadWrite access isallowed to the field. KeyPerm KeyPerms[KeyNum] = KeyNum is the slot 1number of the refill key, which has ReadWrite and Decre- ment permissionto the field. KeyPerms[others= All other keys have 0 . . . 7(exceptReadOnly access. KeyNum)] = 0 End Pos Set as required. Depends on theamount of logical ink the device can store and storage resolution - i.ein picolitres or in microlitres.

[7509] 29 Functions

[7510] 29.1 XferField

[7511] Input: KeyRef, _(M0)OfExternal, _(M1)OfExternal, ChipID,FieldNumL, FieldNumE, InputParameterCheck (Optional), R_(E), SIG_(E),R_(E2)

[7512] Output: ResultFlag, Field data, R_(L2), SIG_(out)

[7513] Changes: _(M0) and R_(L)

[7514] Availablity: Parameter Upgrader QA Device

[7515] 29.1.1 Function Description

[7516] The XferField is similar to the XferAmount function in that itproduces data and signature for updating a given _(M0) field. This dataand signature when applied to the appropriate device through theWriteFieldsAuth function, will upgrade the FieldNumE (_(M0) field) of adevice to the same value as FieldNumL of the upgrading device.

[7517] The system calls the XferField function on the upgrade devicewith a certain FieldNumL to be transferred to the device being upgradedThe FieldNumE is validated by the XferField function according tovarious rules as described in Section 29.1.4. If validation succeeds theXferField function produces the data and signature for subsequentpassing into the WriteFieldsAuth function for the device being upgraded.

[7518] The transfer field output consists of the new data for the fieldbeing upgraded, field data of the two sequence fields, and a signature.When a transfer output is produced, the sequence field data in SEQ_(—)1is decremented by 2 from the previous value (as passed in with theinput), and the sequence field data in SEQ_(—)2 is decremented by 1 fromthe previous value (as passed in with the input).

[7519] Additional InputParameterCheck value must be provided for theparameters not included in the SIG_(E), if the transmission between theSystem and Parameter Upgrader QA Device is error prone, and these errorsare not corrected by the transimission protocol itself.InputParameterCheck is SHA-1[FieldNumL|FieldNumE|XferValLength|XferVal],and is required to ensure the integrity of these parameters, when theseinputs are received by the Parameter Upgrader QA Device.

[7520] The XferField function must first calculate theSHA-1[FieldNumL|FieldNumE], compare the calculated value to the valuereceived (InputParameterCheck) and only if the values match act upon theinputs.

[7521] 29.1.2 Input Parameters

[7522] Table 302 describes each of the input parameters for XferFieldfunction. Parameter Description KeyRef For common key input and outputsignature: KeyRef.keyNum = Slot number of the key to be used for testinginput signature and producing the output signature. SIG_(E) producedusing K_(KeyRef.keyNum) by the QA Device being upgraded. SIGout producedusing K_(KeyRef.keyNum) for delivery to the QA Device being upgraded.KeyRef.useChipId = 0 For variant key input and output signatures:KeyRef.keyNum = Slot number of the key to be used for generating thevariant key. SIG_(E) produced using a variant of K_(KeyRef.keyNum) bythe QA Device being upgraded. SIGout produced using a variant ofK_(KeyRef.keyNum) for delivery to the QA Device being upgraded.KeyRef.useChipId = 1 KeyRef.chipId = ChipId of the device whichgenerated SIG_(E) and will receive SIGout. _(M0)OfExternal All 16 wordsof _(M0) of the QA Device being upgraded _(M1)OfExternal All 16 words of_(M1) of the QA Device being upgraded. ChipId ChipId of the QA Devicebeing upgraded. FieldNumL _(M0) field number of the local (updating)device. The data stored in this field will be copied from the upgradingdevice. FieldNumE _(M0) field number of the QA Device being upgraded.This field will be updated to the value stored in FieldNumL within theupgrading device. R_(E) External random value used to verify inputsignature. This will be the R from the input signature generator (i.edevice generating SIG_(E)). The input signal generator in this case, isthe device being upgraded or a translation device. R_(E2) Externalrandom value used to produce output signature. This will be the Robtained by calling the Random function on the device which will receivethe SIG_(out) from the XferField function. The device receiving theSIG_(out) in this case, is the device being upgraded or a translationdevice. SIG_(E) External signature required for authenticating inputdata. The input data in this case, is the output from the Read functionperformed on the device being upgraded. A correct SIG_(E) =SIG_(KeyRef)(Data | R_(E) | R_(L)).

[7523] 29.1.2.1 Input Signature Verification Data Format

[7524] Refer to Section 27.1.2.1.

[7525] 29.1.3 Output Parameters

[7526] Table 303 describes each of the output parameters for XferFieldfunction. Parameter Description ResultFlag Indicates whether thefunction completed successfully or not. If it did not completesuccessfully, the reason for the failure is returned here. See Section12.1, Table 292 and Table 303. FieldSelect Selection of fields to bewritten In this case the bit corresponding to SEQ_1, SEQ_2 and toFieldNumE are set to 1. All other bits are set to 0. FieldVal Updateddata words for sequence data field and FieldNumE for QA Device beingupgraded. Starts with LSW of lower field. This must be passed as inputto the WriteFieldsAuth function of the QA Device being upgraded. R_(L2)Internal random value required to generate output signature This must bepassed as input to the WriteFieldsAuth function or Translate function ofthe QA Device being upgraded. SIG_(out) Output signature which must bepassed as an input to the WriteFieldsAuth function or Translate functionof the QA Device being upgraded. SIG_(out) = SIG_(KeyRef)(data | R_(L2)| R_(E2)) as per FIG. 373

[7527] 29.1.3.1 Output Signature Generation Data Format

[7528] Refer to Section 27.1.3.1.

[7529] 29.1.4 Function Sequence

[7530] The XferField command is illustrated by the following pseudocode:

[7531] Accept input parameters-KeyRef, M0OfExternal, M1 OfExternal,ChipID, FieldNumL, FieldNumE, R_(E), SIG_(E), R_(E2) Accept inputparameters-KeyRef, M0OfExternal, M1OfExternal, ChipId, FieldNumL,FieldNumE, R_(E), SIG_(E), R_(E2) #Generate message for passing intoValidateKeyRefAndSignature function data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect|M0|M1) # Refer to Figure382. ---------------------------------------------------------------- #Validate KeyRef, and then verify signature ResultFlag =ValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L)) If (ResultFlag ≠Pass) Output ResultFlag Return EndIf---------------------------------------------------------------- #Validatate FieldNumE # FieldNumE is present in the device being upgradedPresentFlagFieldNumE

GetFieldPresent(M1OfExternal,FieldNumE) # Check FieldNumE present flagIf(PresentFlagFieldNumE ≠ 1) ResultFlag

FieldNumEInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- # CheckSeq fields exist and get their Field Number # Get Seqdata field SEQ_1for the device being upgraded XferSEQ_1FieldNum

GetFieldNum(M1OfExternal, SEQ_1) # Check if the Seqdata field SEQ_1 isvalid If(XferSEQ_1FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf # Get Seqdata field SEQ_2for the device being upgraded XferSEQ_2FieldNum

GetFieldNum(M1OfExternal, SEQ_2) # Check if the Seqdata field SEQ_2 isvalid If(XferSEQ_2FieldNum invalid) ResultFlag

SeqFieldInvalid Output ResultFlag Return EndIf---------------------------------------------------------------------------------- #Check write permission for FieldNumE PermOKFieldNumE

CheckFieldNumEPerm(M1OfExternal, FieldNumE) If(PermOKFieldNumE ≠ 1)ResultFlag

FieldNumEWritePermInvalid Output ResultFlag Return EndIf---------------------------------------------------------------------------------- #Check that both SeqData fields have Decrement-Onlypermission with the same key #that has write permission on FieldNumEPermOKXferSeqData

CheckSeqDataFieldPerms(M1OfExternal, XferSEQ_1FieldNum,XferSEQ_2FieldNum,FieldNumE) If(PermOKXferSeqData ≠ 1) ResultFlag

SeqWritePermInvalid Output ResultFlag Return EndIf----------------------------------------------------------------------------- # Get SeqData SEQ_1 data from device being upgradedGetFieldDataWords(XferSEQ_1FieldNum,XferSEQ_1DataFromDevice,M0OfExternal,M1OfExternal) # Get SeqData SEQ_2data from device being upgraded GetFieldDataWords(XferSEQ_2FieldNum,XferSEQ_2DataFromDevice, M0OfExternal,M1OfExternal)---------------------------------------------------------------- #FieldNumL(upgrade value)is a valid field in the upgrading devicePresentFlagFieldNumL

GetFieldPresent(M1,FieldNumL) If(PresentFlagFieldNumL ≠ 1) ResultFlag

FieldNumLInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- #Getthe CountRemaining field associated with the upgrade value field # TheCountRemaining field is the next higher field from the upgrade valuefield FieldNumCountRemaining

FieldNumL + 1 # FieldNumCountRemaining is a valid field in the upgradingdevice PresentFlagFieldNumCountRemaining

GetFieldPresent(M1,FieldNumCountRemaining)If(PresentFlagFieldNumCountRemaining ≠ 1) ResultFlag

CountRemainingFieldInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- #Checkpermission for upgrade value field. Only one key (different fromKeRef.keyNum) has write permissions to the # field and no key hasdecrement permissions. CheckOK

CheckUpgradeKeyForField(FieldNumL,M1,KeyRef) If(CheckOK ≠ 1) ResultFlag

FieldNumEKeyPermInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- #Findthe type attribute for FieldNumE TypeFieldNumE

FindFieldNumType(M1OfExternal,FieldNumE) #Find the type attribute forFieldNumL (upgrade value) TypeFieldNumL

FindFieldNumType(M1,FieldNumL) If(TypeFieldNumE ≠ TypeFieldNumL)ResultFlag

TypeMismatch Output ResultFlag Return EndIf---------------------------------------------------------------- # Checkpermissions for CountRemaining field # Check upgrades are available inthe CountRemaining field of the # upgrading device i.e value ofCountRemaining is non-zero positive number CountRemainingOK

CheckCountRemaining(FieldNumCountRemaining, M0, M1) If(CountRemainingOK≠ 1) ResultFlag

NoUpgradesRemaining Output ResultFlag Return EndIf---------------------------------------------------------------- #Getthe size of the FieldNumL (upgrade value) If(FieldNumL = 0)FieldSizeOfFieldNumL

MaxWordInM− M1[FieldNumL].EndPos Else FieldSizeOfFieldNumL

M1[FieldNumL−1].EndPos− M1[FieldNumL].EndPos EndIf #Get the size of theFieldNumE (field being updated) If(FieldNumL = 0) FieldSizeOfFieldNumE

MaxWordInM− M1OfExternal[FieldNumE − 1].EndPos Else FieldSizeOfFieldNumE

M1OfExternal[FieldNumE−1].EndPos − M1OfExternal[FieldNumL].EndPos EndIf# Check whether the device being upgraded can hold the upgrade valuefrom # FieldNumL If(FieldSizeOfFieldNumE < FieldSizeOfFieldNumL)ResultFlag

FieldNumESizeInsufficient Output ResultFlag Return EndIf---------------------------------------------------------------- # Allchecks complete . . . . . # Generate Seqdata for SEQ_1 and SEQ_2 fieldsXferSEQ_1DataToDevice = XferSEQ_1DataFromDevice − 2XferSEQ_2DataToDevice = XferSEQ_2DataFromDevice − 1 # Add DataSet toXfer Entry Cache AddDataSetToXferEntryCache(ChipId,FieldNumE, FieldNumL,XferSEQ_1DataFromDevice, XferSEQ_2DataFromDevice) #DecrementCountRemaining field by one DecrementField(FieldNumCountRemaining,M0)#Get the upgrade value words from FieldNumE of the upgrading deviceGetFieldDataWords(FieldNumL,UpgradeValue,M0,M1) #Generate new field datawords for FieldNumE. The upgrade value is copied to FieldDataEFieldDataE

UpgradeValue # Generate FieldSelect and FieldVal for SeqData fieldSEQ_1, SEQ_2 and # FieldDataE. . . CurrentFieldSelect

0 FieldVal

0 GenerateFieldSelectAndFieldVal(FieldNumE, FieldDataE,XferSEQ_1FieldNum, XferSEQ_1DataToDevice,XferSEQ_2FieldNum,XferSEQ_2DataToDevice, FieldSelect,FieldVal) #Generate message forpassing into GenerateSignature function data

(RWSense|FieldSelect|ChipId|FieldVal)# Refer to Figure 373. #Createoutput signature for FieldNumE SIG_(out)

GenerateSignature(KeyRef,data,R_(L2),R_(E2)) Update R_(L2) to R_(L3)ResultFlag

Pass Output ResultFlag, FieldSelect,FieldVal, R_(L2 ,)SIG_(out) ReturnEndIf

[7532] 29.1.4.1 CountRemainingOK

[7533] CheckCountRemainingFieldNumL(FieldNumCountRemaining, M1, M0)

[7534] This functions checks permissions for CountRemaining field andalso checks that upgrades are available in the CountRemaining field ofthe upgrading device. AuthRW

M1[FieldNumCountRemaining].AuthRW NonAuthRW

M1[FieldNumCountRemaining].NonAuthRW DOForKeys

_(M1)[FieldNumCountRemaining].DOForKeys[KeyNum] Type

_(M1)[FieldNumCountRemaining].Type If(AuthRW = 1

NonAuthRW = 0

(DOForKeys = 1

(Type = TYPE_COUNT_REMAINING) PermOK

1 Else PermOK

0 Return PermOK EndIf #Get the count-remaining value from the upgradingdevice GetFieldDataWords(FieldNumCountRemaining,CountRemainingValue,M0,M1 ) If(CountRemainingValue <= 0) PermOK

0 Return PermOK EndIf PermOK

1 Return PermOK

[7535] 29.2 RollBackField

[7536] Input: KeyRef, _(M0)OfExternal, _(M1)OfExternal, ChipId,FieldNumL, FieldNumE, InputParameterCheck (optional), R_(E), SIG_(E)

[7537] Output: ResultFlag

[7538] Changes: M₀ and R_(L)

[7539] Availablity: Parameter Upgrader QA Device

[7540] 29.2.1 Function Description

[7541] The RollBackField function is very similar to the RollBackAmountfunction, the only difference being that the RollBackField functionadjusts the value of the count-remaining field associated with theupgrade value field of the upgrading device, instead of the upgradevalue field itself. A successful rollback, increments thecount-remaining by 1.

[7542] The Parameter Upgrader QA Device checks that the Printer QADevice didn't actually receive the transfer message correctly, bycomparing the sequence data field values read from the device with thevalues stored in the Xfer Entry cache. The sequence data field valuesread must match what was previously written using the StartRollBackfunction. After all checks are fulfilled, the Parameter Upgrader QADevice adjusts its FieldNumL.

[7543] Additional InputParameterCheck value must be provided for theparameters not included in the SIG_(E), if the transmission between theSystem and Parameter Upgrader QA Device is error prone, and these errorsare not corrected by the transimission protocol itself.InputParameterCheck is SHA-1[FieldNumL|FieldNumE], and is required toensure the integrity of these parameters, when these inputs are receivedby the Parameter Upgrader QA Device.

[7544] The RollBackField function must first calculate theSHA-1[FieldNumL|FieldNumE], compare the calculated value to the valuereceived (InputParameterCheck) and only if the values match act upon theinputs.

[7545] 29.2.2 Input Parameters

[7546] Table 305 describes each of the input parameters forRollBackField function. Parameter Description KeyRef For common keyinput signature: KeyRef.keyNum = Slot number of the key to be used fortesting input signature. SIG_(E) produced using K_(KeyRef.keyNum) by theQA Device being upgraded. KeyRef.useChipId = 0 For variant key inputsignature: KeyRef.keyNum = Slot number of the key to be used forgenerating the variant key. SIG_(E) produced using a variant ofK_(KeyRef.keyNum) by the QA Device being upgraded. KeyRef.useChipId = 1KeyRef.chipId = ChipId of the device which generated SIG_(E)._(M0)OfExternal 16 words of _(M0) of the QA Device being upgraded whichfailed to upgrade. _(M1)OfExternal 16 words of _(M1) of the QA Devicebeing upgraded which failed to upgrade. ChipId ChipId of the QA Devicebeing upgraded which failed to upgrade. FieldNumL _(M0) field number ofthe local (upgrading) device whose value could not be copied to thedevice being upgraded. FieldNumE _(M0) field number of the QA Devicebeing upgraded to which the upgrade value in FieldNumL couldn't becopied. R_(E) External random value used to verify input signature. Thiswill be the R from the input signature generator (i.e device generatingSIG_(E)). The input signal generator in this case, is the device whichfailed to upgrade or a translation device. SIG_(E) External signaturerequired for authentic- ating input data. The input data in this case,is the output from the Read function performed on the device whichfailed to upgrade. A correct SIG_(E) = SIG_(KeyRef)(Data | R_(E) |R_(L)).

[7547] 29.2.2.1 Input Signature Generation Data Format

[7548] Refer to Section 27.1.2.1 for details.

[7549] 29.2.3 Output Parameters

[7550] Table 306 describes each of the output parameters forRollBackField. Parameter Description ResultFlag Indicates whether thefunction completed successfully or not. If it did not completesuccessfully, the reason for the failure is returned here. See Section12.1, Table 292, Table 304 and Table 295.

[7551] 29.2.4 Function Sequence

[7552] The RollBackField command is illustrated by the followingpseudocode:

[7553] Accept input parameters-KeyRef, M0OfExternal, M1 OfExternal,ChipID, FieldNumL, FieldNumE, R_(E), SIG_(E) Accept inputparameters-KeyRef, M0OfExternal, M1OfExternal, ChipId, FieldNumL,FieldNumE, R_(E),SIG_(E) #Generate message for passing intoGenerateSignature function data

(RWSense|MSelect|KeyIdSelect|ChipId|WordSelect|M0|M1) # Refer to Figure382. ---------------------------------------------------------------- #Validate KeyRef, and then verify signature ResultFlag =ValidateKeyRefAndSignature(KeyRef,data,R_(E),R_(L)) If (ResultFlag ≠Pass) Output ResultFlag Return EndIf---------------------------------------------------------------- # CheckSeq fields exist and get their Field Number # Get Seqdata field SEQ_1num for the device being upgraded XferSEQ_1FieldNum

 GetFieldNum(M1OfExternal, SEQ_1) # Check if the Seqdata field SEQ_1 isvalid If(XferSEQ_1FieldNum invalid) ResultFlag

 SeqFieldInvalid Output ResultFlag Return EndIf # Get Seqdata fieldSEQ_2 num for the device being upgraded XferSEQ_2FieldNum

 GetFieldNum(M1OfExternal, SEQ_2) # Check if the Seqdata field SEQ_2 isvalid If(XferSEQ_2FieldNum invalid) ResultFlag

 SeqFieldInvalid Output ResultFlag Return EndIf---------------------------------------------------------------- # GetSeqData SEQ_1 data from device being upgradedGetFieldDataWords(XferSEQ_1FieldNum,XferSEQ_1DataFromDevice,M0OfExternal,M1OfExternal) # Get SeqData SEQ_2data from device being upgraded GetFieldDataWords(XferSEQ_2FieldNum,XferSEQ_2DataFromDevice, M0OfExternal,M1OfExternal) # Generate Seqdatafor SEQ_1 and SEQ_2 fields with the data that is read XferSEQ_1Data =XferSEQ_1DataFromDevice + 1 XferSEQ_2Data = XferSEQ_2DataFromDevice + 2# Check Xfer Entry in cache is correct - dataset exists, Field data #and sequence field data matches and Xfer State is correct XferEntryOK

 CheckEntry(ChipId, FieldNumE, FieldNumL, XferSEQ_1Data, XferSEQ_2Data)If( XferEntryOK= 0) ResultFlag

 RollBackInvalid Output ResultFlag Return EndIf # Increment associatedCountRemaining by 1 IncrementCountRemaining(FieldNumCountRemaining) #Update XferState in DataSet to complete/deletedUpdateXferStateToComplete(ChipId,FieldNumE) ResultFlag

 Pass Output ResultFlag Return

[7554] Example Sequence of Operations

[7555] 30 Concepts

[7556] The QA Chip Logical Interface interface devices do not initiateany activities themselves. Instead the System reads data and signaturefrom various untrusted devices, and sends the data and signature to atrusted device for validation of signature, and then uses the data toperform operations required for printing, refilling, upgrading and keyreplacement. The system will therefore be responsible for performing thefunctional sequences required for printing, refilling, upgrading and keyreplacement. It formats all input parameters required for a particularfunction, then calls the function with the input parameters on theappropriate QA Chip Logical Interface instance, and thenprocesses/stores the output parameters from the function appropriately.

[7557] Validation of signatures is achieved by either of the followingschemes:

[7558] Direct—the signature produced by an untrusted device is directlypassed in for validation to the trusted device. The direct validationrequires the untrusted device to share a common key or a variant keywith the trusted device. Refer to Section 7 for further details oncommon and variant keys.

[7559] Translation—the signature produced by an untrusted is firstvalidated by the translating device, and a new signature of the readdata is produced by the translation device for validation by the trusteddevice. Several translation device may be chained together—the firsttranslation device validates the signature from the untrusted device,and the last translation device produces the final signature forvalidation by the trusted device. The translation device must share acommon key or a variant key with the trusted/untrusted device and amongthemselves, if several translation devices are chained together forsignature validation.

[7560] 30.1 Representation

[7561] Each functional sequence consists of the following devices (referto Section 4.3):

[7562] System.

[7563] A trusted QA Device—which may be a system trusted QA Device, oran Parameter Upgrader QA Device, or a Ink Refill QA Device, or a KeyProgrammer QA Device depending on the function performed. This device isreferred to as device A.

[7564] An untrusted QA Device—which may be a Printer QA Device, or anInk QA Device. This device is referred to as device B.

[7565] A translation QA Device will be used if a translation scheme isused to validate signatures. This device is referred to as device C.

[7566] The command sequence produced by the system for further sequenceswill be documented as shown in Table 307. TABLE 307 Command sequencerepresentation Sequence No Function Parameters SequenceDevice.FunctionName Input Parameters and their order values. Outputparameters and their description.

[7567] Therefore, a typical direct signature validation sequence can berepresented by FIG. 386 and Table 308.

[7568] For a direct signature to be used, A and B must share a common ora variant key i.e B.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2),B.ChipID). TABLE 308 Command sequence for direct signature validationSequence No Function Parameters 1 A. Random None R_(A) = RL 2 B. ReadKeyRef = n1, SigOnly = 0, MSelect = Any one M, KeyIdSelect = 0,WordSelectForDesiredM = Any one word in the selected M, RE = R_(A) IfResultFlag = Pass then MWords = SelectedWordsOfSelectedMs as per input[MSelect] and [WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGoutRefer to Section 15.3.1. 3 A. Test KeyRef = n2, DataLength = Length ofMWords in words preformatted as per Section 16.1, Data = MWordspreformatted as per Section 16.1, RE = R_(B), SIGE = SIG_(B) ResultFlag= Pass/Fail

[7569] A typical signature validation using translation can berepresented by FIG. 387 and Table 309.

[7570] For validating signatures using translation:

[7571] A and C must share a common or a variant key

[7572] i.e C.K_(n3)=A.K_(n2) or C.K_(n3)=FormKeyVariant(A.K₂, C.ChipID).

[7573] B and C must share a common or a variant key

[7574] i.e C.K_(n2)=B.K_(n1) or B.K_(n1)=Form KeyVariant(C.K_(n2),B.ChipID). TABLE 309 Command sequence for signature validation usingtranslation Sequence No Function Parameters 1 C. Random None R_(C) = RL2 B. Read KeyRef = n1, SigOnly = 1 or 0, MSelect = any, KeyIdSelect =any, WordSelectForDesiredM = any, RE = R_(C) If ResultFlag = Pass thenMWords = SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGout Refer toSection 15.3.1. 3 A. Random None R_(A) = RL 4 C. Translate InputKeyRef =n2, DataLength = Length of MWords in words preformatted as per Section17.1, Data = MWords preformatted as per Section 17.1, RE = R_(B), SIGE =SIG_(B), OutputKeyRef = n3, RE2 = R_(A) If ResultFlag = Pass then R_(C1)= R_(L2), SIG_(C) = SIGOut Refer to Section 15.3.1 5 A. Test KeyRef =n2, DataLength = Length of MWords in words preformatted as per Section16.1, Data = MWords preformatted as per Section 16.1, RE = R_(C1), SIGE= SIG_(C) ResultFlag = Pass/Fail

[7575] 31 In Field Use

[7576] This section covers functional sequences for printer and ink QADevices, as they perform their usual function of printing.

[7577] 31.1 Startup Sequence

[7578] At startup of any operation (a printer startup or an upgradestartup), the system determines the properties of each QA Device it isgoing to communicate with. These properties are:

[7579] Software version of the QA Device. This includesSoftwareReleaseIdMajor and SoftwareReleaseIdMinor. TheSoftwareReleaseIdMajor identifies the functions available in the QADevice. Refer to Section 13.2 for details.

[7580] The number of memory vectors in the QA Device.

[7581] The number of keys in the QA Device.

[7582] The ChipId of the QA Device.

[7583] The properties allow the system to determine which functions areavailable in a given QA Device, as well as the value of input parametersrequired to communicate with the QA Device.

[7584] Table 310 shows the startup sequence TABLE 310 Startup commandsequence Sequence No Function Command 1 B. GetInfo None Major releaseidentifier of the QA Device = SoftwareReleaseIdMajor, Minor releaseidentifier of the QA Device = SoftwareReleaseIdMinor, Number of memoryvectors in the QA Device = NumVectors, Number of keys in the QA Device =NumKeys, Id of the QA Device = ChipId 0 = VarDataLen No VarData in caseof an ink or printer QA Device

[7585] 31.1.1 Clearing the Preauthorisation Field

[7586] Preauthorisation of ink is one of the schemes that a printer mayuse to decrement logical ink as physical ink is used. This is discussedin details in Section 31.4.3.

[7587] If the printer uses preauthorisation, the system must read thepreauthorisation field at startup. If the preauthorisation field is notclear, then the system must apply (decrement) the preauth amount to thecorresponding ink field, by performing a non-authenticated write of thedecremented amount to the appropriate ink field, and then clear thepreauthorisation field by performing an authenticated write to thepreauthorisation field.

[7588] 31.2 Presence Only Authentication

[7589] The purpose of presence only authentication is to determinewhether the printer should or shouldn't work with the ink cartridge.

[7590] 31.2.1 Without Data Interpretation

[7591] This sequence is performed when the printer authenticates the inkcartridge. The authentication consists of verifying a signaturegenerated by the untrusted ink QA Device (in the ink cartridge) usingthe system's trusted QA Device.

[7592] For signature to be valID, the trusted QA Device (A) and theuntrusted ink QA Device (B) must share a common or a variant key i.eB.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID).

[7593] A single word of a single M is read because the system is onlyinterested in the validity of signature for a given data.

[7594] If the printer wants to verify the signature and doesn't requireany data from the ink cartridge (because it is cached in the printer),then the printer calls the Read function with SigOnly set to 1. The Readreturns only the signature of the data as requested by the inputparameters. The printer then sends its cached data and signature (fromthe Read function) to its trusted QA Device for verification. Theprinter may use this signature verification scheme if it has read thedata previously from the ink QA Device, and the printer knows that thedata in the ink QA Device has not changed from value that was readearlier by the printer.

[7595] Table 311 shows the command sequence for performing presence onlyauthentication requiring both data and signature. Seq No FunctionParameters 1 A. Random None R_(A) = RL 2 B. Read KeyRef = n1, SigOnly =0, MSelect = Any one M, KeyIdSelect = 0, WordSelectForDesiredM = Any oneword in the selected M, RE = R_(A) If ResultFlag = Pass then MWords =SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGout Refer toSection 15.3.1. 3 A. Test KeyRef = n2, DataLength = Length of MWords inwords preformatted as per Section 16.1, Data = MWords preformatted asper Section 16.1, RE = R_(B), SIGE = SIG_(B) ResultFlag = Pass/Fail

[7596] 31.2.2 With Data Interpretation

[7597] This sequence is performed when the printer reads the relevantdata from the untrusted QA Device in the ink cartridge. The systemvalidates the signature from the external ink QA Device, and then usesthis data for further processing.

[7598] For signature to be valID, the trusted QA Device (A) and theuntrusted QA Device (B) must share a common or a variant key i.eB.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID).

[7599] The data read assists the printer to determine the followingbefore printing can commence:

[7600] Which fields in _(M0) store logical ink amounts in the ink QADevice.

[7601] The size of the ink fields in the ink QA Device. Refer to Section8.1.1.1.

[7602] The type of ink.

[7603] The amount of ink in the field.

[7604] Table 312 shows the command sequence for performing presence onlyauthentication (with data interpretation). Seq No Function Parameters 1A. Random None R_(A) = RL 2 B. Read KeyRef = n1, SigOnly = 0, MSelect =0x03(indicates M0 and M1), KeyIdSelect = 0xFF (Read all KeyIds),WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all 16 _(M0)words),WordSelectForDesiredM (for _(M1)) = 0xFFFF(Read all 16 _(M1)words), RE =R_(A) If ResultFlag = Pass then MWords = SelectedWordsOfSelectedMs asper input [MSelect] and [WordSelectForDesiredM], All 16 words of _(M0)and _(M1). R_(B) = RL SIG_(B) = SIGout Refer to Section 15.3.1 3 A. TestInput Key = n2, DataLength = Length of MWords in words preformatted asper Section 16.1, Data = MWords preformatted as per Section 16.1, RE =R_(B), SIGE = SIG_(B) ResultFlag = Pass/Fail

[7605] 31.2.2.1 Locating Ink Fields and Determining Ink AmountsRemaining

[7606] Before printing can commence, the printer must determine the inkfields in the ink cartridge so that it can decrement these fields withthe physical use of ink. The printer must also verify that the ink inthe ink cartridge is suitable for use by the printer.

[7607] This process requires reading data from the ink QA Device andthen comparing the data to what is required. To perform the comparisonthe printer must store a list for each ink it uses.

[7608] The ink list must consist of the following:

[7609] Ink Id—A identifier for the ink

[7610] KeyId—The KeyId of the key used to fill/refill this ink.

[7611] Type—This is the type attribute of the ink.

[7612] The ink list stored in the printer is shown in Table 313. Ink IdKeyId Type 1- 1- represents 0x55 represents KeyId of TYPE_REGULAR_(—)black ink Network_OEM_InkFill/ BLACK_INK^(a) RefillKey^(b) 2- 1-represents 0x9F represents KeyId of TYPE_HIGHQUALITY_(—) cyan inkNetwork_OEM_InkFill/ CYAN_INK^(a) RefillKey^(b) 3- 1- represents 0x9Arepresents KeyId of TYPE_HIGHQUALITY_(—) magenta inkNetwork_OEM_InkFill/ MAGENTA_INK^(a) RefillKey^(b) 4- 1- represents 0x9Crepresents KeyId of TYPE_HIGHQUALITY_(—) yellow ink Network_OEM_InkFill/YELLOW_INK^(a) RefillKey^(b)

[7613] a. These Types are only used as an example.

[7614] b. These KeyIds are only used as an example.

[7615] The printer will perform a Read of the ink QA Device's M0, M1 andKeyIds to determine the following:

[7616] The correct ink field (_(M0) field) in the ink QA Device.

[7617] The amount of ink-remaining in the field.

[7618] The ink QA Device's M1 and KeyId helps the printer determine thelocation of the ink field and ink QA Device's M0 and M1 helps determinethe amount of ink-remaining in the field.

[7619] 31.2.2.2 FieldNum FindFieldNum(keyIdRequired, typeRequired)

[7620] This function returns a FieldNum of an M0 field, whoseauthenticated ReadWrite access key's KeyId is keyIdRequired, and whoseType attribute matches typeRequired. If no matching field is found itreturns a FieldNum=255. This function must be available in the printersystem so that it can determine the ink field required by it.

[7621] The function sequence is described below. # Get total number offields in the ink QA Device FieldSize[16]

0 # Array to hold FieldSize assuming there are 16 fields NumFields

FindNumberOfFieldsInM0(M1,FieldSize) # Refer to Section 19.4.1. # Loopthrough KeyIds read assuming all KeyIds have been read from ink QADevice For i

0 to 7 #Check if KeyId read matches If(KeyId_(i)= keyIdRequired #Matching KeyId found KeyNum

 i # Get the KeyNum of the matching KeyId # Now look through the fieldto check which field has #write permissions with this KeyNum For j

0 to NumOfFields AuthRW

_(M1)[j].AuthRW # Isolate AuthRW for field # Check authenticated writeis allowed to the field If(AuthRW = 1) KeyNum_(j)

_(M1)[j].KeyNum # Isolate KeyNum of the field Typej

_(M1)[j].Type #Islotate Type attribute of the field # Check if Key iswrite key for the field and type of Ink Id#2 If (KeyNum = KeyNum_(j))

(Type_(j) = typeRequired) FieldNum

 j return FieldNum EndIf EndIf EndFor # Loop through to next fieldFieldNum

 255 # Error - no field found return FieldNum EndIf EndFor # Loopthrough to next KeyId

[7622] For e.g if the printer wants to find an ink field that matchesInk Id#2 (from Table 313) in the ink QA Device, it must call thefunction FindFieldNum with keyIdRequired=KeyId ofNetwork_OEM_InkFill/Refill Key andtypeRequired=TYPE_HIGHQUALITY_CYAN_INK.

[7623] 31.2.2.3 Ink-Remaining Amount

[7624] This can be determined by using the functionGetFieldDataWords(FieldNum,FieldData

, M0,M1) described in Section 27.1.4.14. FieldNum must be set to thevalue returned from function in Section 31.2.2.2. FieldData returns theink-remaining amount.

[7625] The function GetFieldDataWords(FieldNum,FieldData

, M0,M1) must be implemented in the printer system.

[7626] 31.3 Presence Only Authentication Through the Translate Function

[7627] This sequence is performed when the printer reads the data fromthe untrusted ink QA Device in the ink cartridge but uses a translatingQA Device to indirectly validate the read data. The translating QADevice validates the signature using the key it shares with theuntrusted QA Device, and then signs the data using the key it shareswith the trusted QA Device. The trusted QA Device then validates thesignature produced by the translating QA Device.

[7628] For validating signatures using translation:

[7629] A and C must share a common or a variant key

[7630] i.e C.K_(n3)=A.K₂ or C.K_(n3)=FormKeyVariant(A.K_(n2), C.ChipID).

[7631] B and C must share a common or a variant key

[7632] i.e C.K_(n2)=B.K_(n1) or B.K_(n1)=FormKeyVariant(C.K_(n2),B.ChipID).

[7633] Table 314 shows a command sequence for presence onlyauthentication using translation Seq No Function Parameters 1 C. RandomNone R_(C) = RL 2 B. Read KeyRef = n1, SigOnly = 1 or 0, MSelect = anyM, KeyIdSelect = 0, WordSelectForDesiredM = any, RE = R_(C) IfResultFlag = Pass then MWords = SelectedWordsOfSelectedMs as per input[MSelect] and [WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGoutRefer to Section 15.3.1 3 A. Random None R_(A) = RL 4 C. TranslateInputKeyRef = n2, DataLength = Length of MWords in words preformatted asper Section 17.1, Data = MWords preformatted as per Section 17.1, RE =R_(B), SIGE = SIG_(B), OutputKeyRef = n3, RE2 = R_(A) If ResultFlag =Pass then R_(C1) = RL1, SIG_(C) = SIGOut Refer to Section 15.3.1 5 A.Test KeyRef = n2, DataLength = Length of MWords in words preformatted asper Section 16.1, Data = MWords preformatted as per Section 16.1, RE =R_(C1), SIGE = SIG_(C) ResultFlag = Pass/Fail

[7634] 31.4 Updating the Ink-Remaining

[7635] This sequence is performed when the printer is printing. The inkQA Device holds the logical amount of ink-remaining corresponding to thephysical ink left in the cartridge. This logical ink amount mustdecrease, as physical ink from the ink cartridge is used for printing.

[7636] 31.4.1 Sequence of Update

[7637] The primary question is when to deduct the logical inkamount—before or after the physical ink is used.

[7638] a. Print first (use physical ink) and then update the logicalink. If the power is cut off after a physical print and before a logicalupdate, then the logical update is not performed. Therefore, the logicalink-remaining is more than the physical ink-remaining. Performingrepeated power cuts will increase the differential amount, and finallyany physical ink could be used to refill the QA Device.

[7639] b. Update the logical ink and then print (use physical ink). Thisis better than (a) because other physical inks cannot be used. However,if a problem occurs during printing, after the logical amount hasalready been deducted, there will be a disparity between logical andphysical amounts. This might result in the printer not printing even ifphysical ink is present in the ink cartridge. The amount of disparitycan be reduced by increasing the frequency of updating logical ink i.eupdate after each line instead of after each page.

[7640] c. Preauthorise logical ink. Preauthorise certain amount of ink(depends on the frequency of logical updates) before print and clear itat the end of printing. If power is cut off after a page is printed,then on start up, the printer reads the preauthorisation field, if ithas not been cleared, it applies the preauth amount to the ink-remainingamount, and then clears the preauthorisation field.

[7641] 31.4.2 Basic Update

[7642] Some printers may use one of methods described in Section 31.4.1(a) or (b) to update logical ink amounts in the ink QA Device. Thismethod of updating the ink is termed as a basic update. The decrementedamount is written to the appropriate ink field (which has beenpreviously determined using Section 31.2.2) in _(M0) The printerverifies the write, by reading the signature of the written data, thenpassing it to the Test function of the trusted QA Device.

[7643] For signature to be valID, the trusted QA Device (A) and ink QADevice (B) must share a common or a variant key i.e B.K_(n1)=A.K_(n2) orB.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID). TABLE 315 Command sequencefor updating the ink-remaining (basic) Seq No Function Parameter 1B.WriteFields VectNum = 0, FieldSelect = Select bits corresponding tothe Ink fields, The ink field locations should have been determinedbefore by using the method in Section 31.2.2.1 FieldVal = Decrementedink-remaining amount ResultFlag = Pass/Fail 2 A.Random None R_(A) = RL 3B.Read KeyRef = n1, SigOnly = 1, (We only need the signature because wealready know the data) MSelect = _(M0), KeyIdSelect = 0,WordSelectForDesiredM = corresponds to the ink fields written in Seq No1, RE = R_(A) If ResultFlag = Pass then SelectedWordsOfSelectedMs notreturned because [SigOnly] = 1 in Seq 3, R_(B) = R_(L), SIG_(B) = SIGoutRefer to Section 15.3.1 4 A.Test KeyRef = n2, DataLength = length inwords as per Seq No 1 [MVal] preformatted as per Section 16.1, Data = asper Seq No 1 [MVal] preformatted as per Section 16.1, RE = R_(B), SIGE =SIG_(B) ResultFlag = Pass/Fail

[7644] 31.4.3 Preauthorisation

[7645] This section describes the update of logical ink amounts usingpreauthorisation.

[7646] The basic preauthorisation sequence is as follows:

[7647] a. Preauthorise before the first print. Preauthorisation amountdepends on the printer model. Example amounts could be the ink requiredfor an fully covered A4 page or an A3 page. Value corresponding to thepreauth amount is written to the preauth field in the ink QA Device.

[7648] Note: The preauth value must be correctly interpreted ondifferent printer models i.e if a preauthorisation amount of A4 page isset in the ink cartridge in printer1(model1), and later the inkcartridge is placed in printer2(model2) with its preauth still set,printer2 must deduct an A4 page worth of ink from ink-remaining amount.

[7649] b. Print the page.

[7650] c. Write the deducted logical amount to the ink field of the inkQA Device and validate the write by reading the signature of the inkfield.

[7651] d. Repeat b to c till the last page has been printed.

[7652] e. Clear the preauth amount.

[7653] f. If the power is cut off before the preauth is applied, onstartup apply the preauth amount to the corresponding ink field, byperforming a non authenticated write of the decremented amount and clearthe preauth amount by performing an authenticated write of the preauthfield.

[7654] 31.4.3.1 Set Up of the Preauth Field

[7655] Only a single preauth gield must exist in an Ink QA Device.

[7656] Preauth field will consist of a single _(M0) word but can beoptionally extended to two _(M0) words by using a different value oftype attribute. FIG. 388 shows the setup of preauth field's attributesin _(M1).

[7657] The preauth field has authenticated ReadWrite access using theINK_USAGE_KEY i.e INK_USAGE_KEY can perform authenticated writes to thisfield. This key or its variant is shared between the ink QA Device andthe printer QA Device to validate any data read from the ink cartridge.For signature to be valID, B.K_(n1)=A.K_(n2) orB.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID), where K_(n1)=INK_USAGE_KEY.The system performs a WriteAuth to the preauth field using this key, toset up the preauth amount, and to clear the preauth amount.

[7658] The preauth field is identified by two attributes:

[7659] Type attribute—TYPE_PREAUTH. Refer to Appendix A.

[7660] KeyId of KeyNum attribute must be the same as the KeyId of theINK_USAGE_KEY which the printer uses to validate the any data read fromthe ink QA Device.

[7661] The Preauth field can be applied to a single ink field ormultiple ink fields.

[7662] 31.4.3.2 Preauth Applied to a Single Ink Field

[7663] In this case the entire preauth field is used to store thepreauth amount and is only linked to one ink field.

[7664] 31.4.3.3 Preauth Applied to Multiple Ink Fields

[7665] Multiple preauth fields can be accommodated in a single M₀ fieldby a scheme shown in FIG. 388A.

[7666] This scheme supports a maximum of 8 ink fields being present inthe Ink QA Device.

[7667] The field in _(M0) is divided into two parts— preauth fieldselect and preauth amount. Each bit in preauth field select correspondsto a single ink field, and the preauth amount for each ink field is thesame. If an ink cartridge uses multiple inks which are preauthorised,then each of the inks will have a corresponding preauth field bit.Before a particular ink is used for printing the corresponding preauthfield bit is set. The preauth amount field is also set if the previousamount is zero. At finish, the preauth field bit is cleared. If morethan one ink is used, the preauth bit for each ink field is set, and atfinish each bit is cleared with last bit clearing the preauth amount aswell.

[7668] 31.4.3.4 Locating Preauth Fields and Determining Preauth FieldValue

[7669] The preauth field can be located in the same manner as the inkfield. If the printer wants to find the preauth field in the ink QADevice, it must call the function FindFieldNum (see Section 31.2.2.2)with keyIdRequired=KeyId of Network_OEM_Ink_Usage_Key andtypeRequired=TYPE_PREAUTH. The preauth field value can be read in thesame manner as the ink-remaining amount. This requires using of thefunction GetFieldDataWords(FieldNum,FieldData

, M0,M1) described in Section 27.1.4.14. FieldNum must be set to thevalue returned from function FindFieldNum, which in this case is thefield number of the preauth field. FieldData returns the value of thepreauth field.

[7670] 31.4.3.5 Command Sequence

[7671] The command sequence can be broken up into three parts:

[7672] Start of print sequence.

[7673] During print sequence.

[7674] End of print sequence.

[7675] 31.4.3.5.1 Start of Print Sequence

[7676] This sets up the preauth amount before the start of printing.

[7677] Table 316 shows the command sequence for start of print sequence.The first Random-Read-Test sequence determines the preauth field in theink QA Device and its value. The Random-SignM-WriteFieldsAuth sequence,then writes to the preauth field the new preauth value. TABLE 316Updating the consumable remaining (preauth) start of print sequence SeqNo Function Parameters Random-Read - Test sequence to determine thelocation of the preauth field in the ink QA Device and its value 1A.Random None R_(A) = RL 2 B.Read KeyRef = n1, SigOnly = 0,WordSelectForDesiredM (for _(M0)) = all 16 words of M0 and all 16 wordsof M1 MSelect = 0x03(indicates M0 and M1), KeyIdSelect = 0xFF (Read allKeyIds), WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all 16_(M0)words), WordSelectForDesiredM (for _(M1)) = 0xFFFF(Read all 16_(M1)words), RE = R_(A) If ResultFlag = Pass then MWords =SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGout Refer toSection 15.3.1 3 A.Test KeyRef = n2, DataLength = length of MWords inwords preformatted as per Section 16.1, Data = MWords as per Seq No 2preformatted as per Section 16.1, RE = R_(B), SIGE = SIG_(B) ResultFlag= Pass/Fail Random-SignM-WriteFieldsAuth sequence to write the newpreauth value 4 B.Random None R_(B1) = RL 5 A.SignM KeyRef = n2,FieldSelect = Select bit corresponding to the Preauth field, FieldVal =new preauth value, ChipId = ChipId of B, R_(E) = R_(B1) If ResultFlag =Pass then R_(A1) = R_(L) SIG_(A) = SIGout Refer to Section 27.1.3.1 6B.WriteFieldsAuth KeyRef = n1, FieldSelect = same as Seq 5[FieldSelect], FieldVal = same as Seq 5 [FieldVal], RE = R_(A1) SIGE =SIG_(A) ResultFlag = Pass/Fail

[7678] 31.4.3.5.2

[7679] During Print Sequence

[7680] This set of commands are repeated at equal intervals to updatelogical ink amounts to the ink QA Device during printing.

[7681] Table 317 shows the command sequence for the print sequence. TheWriteFields writes the updated value to the ink field. Random-Read-Testreads back the value written and tests whether the value read matchesthe value written. TABLE 317 Updating the consumable remaining (preauth)during print sequence Seq No Function Parameters Write the decrementedink-remaining account. 7 B.WriteFields FieldSelect = Select bitscorresponding to the Ink fields, FieldVal = Decremented ink-remainingamount for a single ink or multiple ink fields as per FieldSelect.ResultFlag = Pass/Fail Random-Read-Test sequence to read and verify theink- remaining amount written 8 A.Random None R_(A) = RL 9 B.Read KeyRef= n1, SigOnly = 1 − (We only need the signature because we already knowthe data), MSelect = 0x01 (only _(M0)), KeyIdSelect = 0,WordSelectForDesiredM = corresponds to the ink fields written in Seq No7, RE = R_(A) If ResultFlag = Pass then SelectedWordsOf SelectedMs notreturned because [SigOnly] = 1 in Seq 9 R_(B) = R_(L), SIG_(B) = SIGoutRefer to Section 15.3.1. 10 A.Test KeyRef = n2, DataLength = length inwords as per Seq No 7 [MVal] preformatted as per Section 16.1, Data = asper Seq No 7 [MVal] preformatted as per Section 16.1, RE = R_(B), SIGE =SIG_(B) ResultFlag = Pass/Fail

[7682] 31.4.3.5.3 End of Print Sequence

[7683] This sequence clears preauth amount before the print sequence iscompleted.

[7684] Table 318 shows the command sequence for the end of printsequence.

[7685] The preauth field is read using the Random-Read-Test sequence.And the preauth field is cleared using the Random-SignM-WriteFieldsAuthsequence. TABLE 318 Updating the consumable remaining (preauth) end ofprint sequence Seq No Function Parameters Random-Read-Test sequence toread the preauth field and verify the preauth data 11 A.Random NoneR_(A) = R_(L) 12 B.Read KeyRef = n1, SigOnly = 1, MSelect = 0x01 (onlyM0), KeyIdSelect = 0, WordSelectForDesiredM (for _(M0)) = Wordscorresponding to the Preauthfield that has been written to in Seq 5[FieldSelect] in Table 317. RE = R_(A) If ResultFlag = Pass then MWords= SelectedWordsOfSelectedMs as per Seq No 12 [MSelect] and[WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGout Refer toSection 15.3.1 13 A.Test KeyRef = n2, DataLength = length of MWords inwords as per Seq No 12 preformatted as per Section 16.1, Data = MWordsas per Seq No 12 preformatted as per Section 16.1, RE = R_(B), SIGE =SIG_(B) ResultFlag = Pass/Fail Random-SignM-WriteFieldsAuth sequenceclears the preauth field 14 B.Random None R_(B1) = R_(L) 15 A.SignMKeyRef = n2, FieldSelect = Select bit corresponding to Pre authfield,FieldVal = Clear the preauth field, ChipId = ChipId of B, R_(E) = R_(B1)If ResultFlag = Pass then R_(A1) = R_(L) SIG_(A) = SIGout Refer toSection 27.1.3.1 16 B.WriteFieldsAuth KeyRef = n1, FieldNum = same asSeq 5 [FieldSelect], FieldData = same as Seq 5 [FieldVal], RE = R_(B1),SIGE = SIG_(A) ResultFlag = Pass/Fail

[7686] 31.4.4 Preauthorisation Through the Translate Function

[7687] This is performed when the system trusted QA Device doesn't sharea key with the ink QA Device, and uses a translating QA Device toTranslate a Read from the ink QA Device, and to Translate a SignM to theink QA Device.

[7688] The basic translate principle involves translating the Read datafrom the untrusted QA Device, to the Test data of the trusted QA Device,and translating the SignM data from the trusted QA Device, to theWriteFieldsAuth data of the untrusted QA Device.

[7689] For validating signatures using translation:

[7690] The trusted QA Device (A) and the translating QA Device (C) mustshare a common or a variant key i.e C.K_(n3)=A.K_(n2) orC.K₃=FormKeyVariant(A.K_(n2), C.ChipID).

[7691] The ink QA Device (B) and the translating QA Device (C) mustshare a common or a variant key i.e C.K_(m2)=B.K_(n1) orB.K_(n1)=FormKeyVariant(C.K_(n2), B.ChipID).

[7692] Only the start of print sequence is described using Translate.The rest of the sequences in preauthorisation can be modified to applytranslation using this example.

[7693] Table 319 shows the command sequence for preauth (start of printsequence) using translation. TABLE 319 Preauth(start of print sequence)using translate command Seq No Function ParameterRandom-Read-Random-Translate-Test sequence reads the location of thepreauth field and its value using the translating QA Device C 1 C.RandomNone R_(C) = RL 2 B.Read KeyRef = n1, SigOnly = 0, MSelect =0x03(indicates M0 and M1), KeyIdSelect = 0xFF (Read all KeyIds),WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all 16 _(M0) words),WordSelectForDesiredM (for _(M1)) = 0xFFFF (Read all 16 _(M1) words), RE= R_(A) If ResultFlag = Pass then MWords = SelectedWordsOfSelectedMs asper input [MSelect] and [WordSelectForDesiredM], R_(B) = R_(L), SIG_(B)= SIGout Refer to Section 15.3.1 3 A.Random None R_(A) = RL 4C.Translate InputKeyRef = n2, DataLength (in words) = length of MWordsin words as per Seq No 2 preformatted as per Section n 17.1, Data =MWords as returned from Seq No 2 preformatted as per Section 17.1, RE =R_(B), SIGE = SIG_(B) OutputKeyRef = n3, RE2 = R_(A) If ResultFlag =Pass then R_(C1) = RL2, SIG_(C) = SIGOut Refer to FIG. 15.3.1 5 A.TestKeyRef = n2, DataLength = length of MWords in words as per Seq No 2preformatted as per Section 16.1, Data = MWords as returned from Seq No2 parameter preformatted as per Section 16.1, RE = R_(C1), SIGE =SIG_(C) ResultFlag = Pass/FailRandom-SignM-Random-Translate-WriteFieldAuth sequence to write the newpreauth value using the translating QA Device C 3 C.Random None R_(C2) =R_(L) 7 A.SignM KeyRef = n2, FieldSelect = Select bit corresponding toPre authfield, FieldVal = new value of preauth field, ChipId = ChipId ofB, R_(E) = R_(C2) If ResultFlag = Pass then R_(A1) = R_(L) SIG_(A) =SIGout Refer to Section 27.1.3.1 8 B.Random None R_(B1) = R_(L) 9C.Translate InputKeyRef = n3, DataLength (in words) = length in words asper Seq 7 [FieldSelect] preformatted as per Section 17.1, Data = same asSeq 7 [FieldVal] preformatted as per Section 17.1, RE = R_(A1), SIGE =SIG_(A), OutputKeyRef = n2, RE2 = R_(B1) If ResultFlag = Pass thenR_(C3) = R_(L2), SIG_(C) = SIGOut Refer to FIG. 15.3.1 10B.WriteFieldsAuth KeyRef = n1, FieldNum = same as Seq 7 [FieldSelect],FieldData = same as Seq 7 [FieldVal], RE = R_(C3), SIGE = SIG_(C)ResultFlag = Pass/Fail,

[7694] 31.5 Upgrading the Printer Parameters

[7695] This sequence is performed when a printer's operating parameteris upgraded.

[7696] The Parameter Upgrader QA Device stores the upgrade value whichis copied to the operating parameter field of the Printer QA Device, andthe count-remaining associated with upgrade value is decremented by 1 inthe Parameter Upgrader QA Device.

[7697] The Parameter Upgrader QA Device output the data and signatureonly after completing all necessary checks for the upgrade.

[7698] 31.5.1 Basic

[7699] The basic upgrade is used when the Parameter Upgrader QA Deviceand Printer QA Device being upgraded share a common key or a variant keyi.e B.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID),where B is the Printer QA Device and A is the Parameter Upgrader QADevice.

[7700] Therefore, the messages and their signatures, generated by eachof them can be correctly interpreted by the other.

[7701] The transfer sequence is performed usingRandom-Read-Random-XferField-WriteFieldsAuth. Table 320 shows thecommand sequence for a basic upgrade. TABLE 320 Basic upgrade commandsequence Seq No Function ParameterRandom-Read-Random-XferField-WriteFieldsAuth reads M0 and M1 of the QADevice being upgraded, Parameter Upgrader QA Device produces the upgradevalue for FieldNumE and Sequence data fields SEQ_1 and SEQ_2, then thesevalues are written to the Printer QA Device. 1 A.Random None R_(A) =R_(L) 2 B.Read KeyRef = n1, SigOnly = 0, MSelect = 3 (indicates _(M0)and _(M1)), KeyIdSelect = 0x00 (no KeyIds required),WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all _(M0) words),WordSelectForDesiredM (for _(M1)) = 0xFFFF (Read all _(M1) words), RE =R_(A) If ResultFlag = Pass then MWords = SelectedWordsOfSelectedMs, asper input [MSelect] and [WordSelectForDesiredM], R_(B) = RL,. SIG_(B) =SIGout Refer to Section 15.3.1 3 B.Random None R_(B1) = R_(L) 4A.XferField KeyRef = n2, _(M0)OfExternal = First 16 words of MWords,_(M1)OfExternal = Last 16 words of MWords, ChipId = ChipId of B,FieldNumL = The field storing the upgrade value in the ParameterUpgrader QA Device. The value of this field will be copied to FieldNumE.FieldNumE = The field which will be upgraded in the Printer QA Device.RE = R_(B), R_(E2) = R_(B1), SIG_(E) = SIG_(B) If ResultFlag = Pass thenFieldSelectB1 = FieldSelect − Select bits for FieldNumE and Seq datafields SEQ_1 and SEQ_2 field, FieldValB1 = FieldVal -New Value forFieldNumE (Copied from FieldNumL of the Parameter Upgrader QA Device)and sequence data fields R_(A1) = R_(L2), SIG_(A) = SIGout = Refer toSection 27.1.3.1. 5 B.WriteFieldsAuth KeyRef = n1, FieldSelect =FieldSelectB1, FieldData = FieldValB1, RE = R_(A1), SIGE = SIG_(A)ResultFlag = Pass/Fail

[7702] 31.5.2 Using the Translate Function

[7703] The upgrade through the Translate function is used when theParameter Upgrader QA Device and the Printer QA Device don't share a keybetween them. The translating QA Device shares a key with the ParameterUpgrader QA Device and a second key with the Printer QA Device.Therefore the messages and their signatures, generated by the ParameterUpgrader QA Device and the Printer QA Device are translatedappropriately by the translating QA Device. The translating QA Devicevalidates the Read from the Printer QA Device, and translates it forinput to the XferField function. The translating QA Device will validatethe output from the XferField function, and then translate it for inputto WriteFieldsAuth message of the Printer QA Device.

[7704] For validating signatures using translation:

[7705] The Parameter Upgrader QA Device (A) and the translating QADevice (C) must share a common or a variant key i.e C.K_(n3)=A.K_(n2) orC.K_(n3)=FormKeyVariant(A.K_(n2), C.ChipID).

[7706] The Printer QA Device (B) and the translating QA Device (C) mustshare a common or a variant key i.e C.K_(n2)=B.K_(n1) orB.K_(n1)=FormKeyVariant(C.K_(n2), B.ChipID).

[7707] Table 321 shows the command sequence for a basic refill usingtranslation. TABLE 321 An upgrade with translate command sequence Seq NoFunction CommandRandom-Read-Random-Translate-Random-XferField-Random-Translate-Random-WriteFieldsAuthreads M0 and M1 of the Printer QA Device using the translating QA DeviceC and then does a write of the upgrade value to FieldNumE and newsequence data to the seq data fields SEQ_1 and SEQ_2 field of thePrinter QA Device using the translating QA Device C. 1 C.Random NoneR_(C) = R_(L) 2 B.Read KeyRef = n1, SigOnly = 0, MSelect =0x03(indicates _(M0) and _(M1)), KeyIdSelect = 0x00 (no KeyIdsrequired), WordSelectForDesiredM (for _(M0))= 0xFFFF (Read all _(M0)words), WordSelectForDesiredM (for _(M1)) = 0xFFFF (Read all _(M1)words), R_(E) = R_(C) If ResultFlag = Pass then MWords = SelectedWordsOfSelectedMs as per input [MSelect] and [ WordSelectForDesiredM],R_(B) = RL, SIG_(B) = SIGout Refer to Section 15.3.1 3 A.Random NoneR_(A) = R_(L) 4 C.Translate InputKeyRef = n2, DataLength = MWords lengthin words as per Seq No 2 preformatted as per Section 17.1, Data = MWordsas returned from Seq No 2 preformatted as per Section 17.1, RE = R_(B),SIGE = SIG_(B), OutputKeyRef = n3, RE2 = R_(A) If ResultFlag = Pass thenR_(C1), = RL2, SIG_(C) = SIGOut Refer to Section 17.3.1 5 C.Random NoneR_(C2) = R_(L) 6 A.XferField KeyRef = n2, _(M0)OfExternal = First 16words of MWords, _(M1)OfExternal = Last 16 words of MWords, ChipId =ChipId of B, FieldNumL = The field storing the upgrade value in theParameter Upgrader QA Device. FieldNumE = The field which will beupgraded in the Printer QA Device. R_(E) = R_(C1), R_(E2) = R_(C2),SIG_(E) = SIG_(C) If ResultFlag = Pass then FieldSelectB1 = FieldSelect− Select bits for FieldNumE and sequence fields, FieldValB1 = FieldVal −New Value for FieldNumE (Copied from FieldNumL of the Parameter UpgraderQA Device) and sequence fields SEQ_1 and SEQ_2, R_(A1) = R_(L2), SIG_(A)= SIGout Refer to Section 27.1.3.1 7 B.Random None R_(B1) = R_(L) 8C.Translate InputKeyRef = n3, DataLength = FieldValB1 length in words asper Seq No 6 preformatted as per Section 17.1, Data = FieldValB1 asreturned from Seq No 6 preformatted as per Section 17.1, RE = R_(A1),SIGE = SIG_(A), OutputKeyRef = n2, RE2 = R_(B1) If ResultFlag = Passthen R_(C3) = R_(L2), SIG_(C) = SIGOut Refer to Section 17.3.1 19B.WriteFieldsAuth KeyRef = n1, FieldSelect = FieldSelectB1, FieldVal =FieldValB1, RE = R_(C3), SIGE = SIG_(C) 10 ResultFlag = Pass/Fail

[7708] 31.6 Recovering From a Failed Upgrade

[7709] This sequence is performed if the upgrade failed (for e.g PrinterQA Device didn't receive the upgrade message correctly and hence didn'tupgrade successfully). The Parameter Upgrader QA Device therefore needsto be rolled back to the previous value before the upgrade. In thiscase, the count-remaining associated with the upgrade value in theParameter Upgrader QA Device is increased by one.

[7710] The Parameter Upgrader QA Device checks that the Printer QADevice didn't actually receive the message correctly using theStartRollBack function. The RollBackField performs further comparisonson sequence fields and FieldNumE of the Printer QA Device to valuesstored in the XferEntry cache. After performing all checks, theParameter Upgrader QA Device increments the count remaining fieldassociated with the upgrade value field by one. Refer to Section 26 andSection 28 for details.

[7711] The rollback is started using theRandom-Read-Random-StartRollBack-WriteFieldsAuth and the rollback of theParameter Upgrader QA Device is performed usingRandom-Read-RollBackField sequence.

[7712] Table 322 shows the command sequence for a rollback upgrade. SeqNo Function Command Random-Read-Random-StartRollBack-WriteFieldsAuthstarts the rollback and updates data for the sequence fields. 1 A.RandomNone R_(A) = RL 2 B.Read KeyRef = n1 , SigOnly = 0, MSelect =0x03(indicates _(M0) and _(M1)), KeyIdSelect = 0x00 (no KeyIdsrequired), WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all _(M0)words), WordSelectFor DesiredM (for _(M1)) = 0xFFFF(Read all _(M1)words), R_(E) = R_(A) If ResultFlag = Pass then MWords = SelectedWordsOfSelectedMs as per input [MSelect] and [WordSelectFor DesiredM], R_(B) =R_(L), SIG_(B) = SIGout Refer to Section 15.3.1 3 B.Random None R_(B1) =R_(L) 4 A.StartRoll KeyRef = n2, _(M0)OfExternal = First 16 words ofMWords, Back _(M1)OfExternal = Last 16 words of MWords, ChipId = ChipIdof B, FieldNumE= The field which was not upgraded in the Printer QADevice, FieldNumL = The upgrade value in the Parameter Upgrader QADevice which couldn't be copied to FieldNumE of the Printer QA Device,R_(E) = R_(B), R_(E2) = R_(B1), SIG_(E) = SIG_(B) If ResultFlag = Passthen FieldSelectB = FieldSelect − Select bits for sequence data fieldsSEQ_1 and SEQ_2, FieldValB = FieldVal − New values for SEQ_1 and SEEQ_2fields R_(A1) = R_(L2) SIG_(A) = SIGout Refer to Section 27.1.3.1. 5B.WriteFieldsAuth KeyRef = n1, FieldSelect = FieldSelectB, FieldData =FieldValB, RE = R_(A1), SIGE = SIG_(A) ResultFlag = Pass/FailRandom-Read-RollBackField performs a read of the QA Device beingupgraded, checks its values are as per Xfer Entry cache, and thenadjusts its count-remaining field. 6 A.Random None R_(A2) = RL 7 B.ReadKeyRef = n1, SigOnly = 0, MSelect = 0x03 (indicates _(M0) and _(M1)),KeyIdSelect = 0x00 (no KeyIds required), WordSelectForDesiredM (for_(M0)) = 0xFFFF (Read all _(M0) words), WordSelectForDesiredM (for_(M1)) = 0xFFFF(Read all _(M1) words), R_(E) = R_(A2) If ResultFlag =Pass then MWords = SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B2) = RL, SIG_(B) = SIGout Refer to Section15.3.1 8 A.Roll Back KeyRef = n2, _(M0)OfExternal = First 16 words ofMWords, Field _(M1)OfExternal = Last 16 words of MWords, ChipId = ChipIdof B, FieldNumE = The field which was not upgraded in the Printer QADevice, FieldNumL = The upgrade value in the Parameter Upgrader QADevice which couldn't be copied to FieldNumE of the Printer QA Device,R_(E) = R_(B2), SIG_(E) = SIG_(B) ResultFlag = Pass/Fail

[7713] 31.7 Re/Filling the Consumable (INK)

[7714] This sequence is performed when an ink cartridge is firstmanufactured or after all the physical ink has been used, it can befilled or refilled. The re/fill protocol is used to transfer the logicalink from the Ink Refill QA Device to the Ink QA Device in the inkcartridge.

[7715] The Ink Refill QA Device stores the amount of logical inkcorresponding to the physical ink in the refill station. During therefill, the required logical amount (corresponding to the physicaltransfer amount) is transferred from the Ink Refill QA Device to the InkQA Device.

[7716] The Ink Refill QA Device output the transfer data only aftercompleting all necessary checks to ensure that correct logical ink typeis being transferred e.g Network_OEM1_infrared ink is not transferred toNetwork_OEM2_cyan ink. Refer to the XferAmount command in Section 27.1.

[7717] 31.7.1 Basic Refill

[7718] The basic refill is used when the Ink Refill QA Device and theInk QA Device share a common key or a variant key i.e B.K_(n1)=A.K_(n2)or B.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID) where B is the Ink QADevice and A is the Ink Refill QA Device. Therefore, the messages andtheir signatures, generated by each of them can be correctly interpretedby the other.

[7719] The Xfer Sequence is started usingRandom-Read-Random-StartXfer-WriteAuth and the the Xfer Amount iswritten to the QA Device being refilled usingRandom-Read-Random-XferAmount-WriteFieldsAuth sequence. TABLE 323 thecommand sequence for a basic refill. Seq No Function ParameterRandom-Read-Random-XferAmount-WriteFieldsAuth reads M0 and M1 of the InkQA Device being refilled, produce updated amount for FieldNumE andsequence datat field by calling XferAmount on Ink Refill QA Device, andfinally writing the updated value to Ink QA Device usingWriteFieldsAuth. 1 A.Random None R_(A) = R_(L) 2 B.Read KeyRef = n1,SigOnly = 0, MSelect = 0x03(indicates _(M0) and _(M1)), KeyIdSelect =0x00 (no KeyIds required), WordSelectForDesiredM (for _(M0)) = 0xFFFF(Read all _(M0) words), WordSelectForDesiredM (for _(M1)) = 0xFFFF (Readall _(M1) words), RE = R_(A) If ResultFlag = Pass then MWords =SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B) = RL, SIG_(B) = SIGout Refer to Section15.3.1 3 B.Random None R_(B1) = R_(L) 4 AxferAmount KeyRef = n2,_(M0)OfExternal = First 16 words of MWords, _(M1)OfExternal = Last 16words of MWords, ChipId = ChipId of B, FieldNumL = ink-remaining fieldof the Ink Refill QA Device, FieldNumE = ink- remaining field of the InkQA Device, XferValLength = length in words of XferVal XferVal = Value tobe transferred from Ink Refill QA Device to Ink QA Device beingrefilled, R_(E) = R_(B), R_(E2) = R_(B1), SIG_(E) = SIG_(B) IfResultFlag = Pass then FieldSelectB1 = FieldSelect − Select bits forFieldNumE and sequence data field SEQ_1 and SEQ_2, FieldValB1 = FieldVal− New Value for FieldNumE (transferred from FieldNumL of the Ink RefillQA Device) and sequence data fields SEQ_1 and SEQ_2, R_(A1) = R_(L2),SIG_(A) = SIGout Refer to Section 27.1.3.1. 5 B.WriteFieldsAuth KeyRef =n1, FieldSelect = FieldSelectB, FieldData = FieldValB, RE = R_(A1), SIGE= SIG_(A) ResultFlag = Pass/Fail

[7720] 31.7.2 Using the Translate Function

[7721] The refill through the Translate function is used when the InkRefill QA Device and the Ink QA Device don't share a key between them.The translating QA Device shares a key with the Ink Refill QA Device anda second key with the Ink QA Device. Therefore the messages and theirsignatures, generated by the Ink Refill QA Device and the Ink QA Device,are translated appropriately by the translating QA Device. Thetranslating QA Device validates the Read from the Ink QA Device, andtranslates it for input to the XferAmount function. The translating QADevice will validate the output from the XferAmount function, and thentranslate it for input to WriteFieldsAuth message of the Ink QA Device.

[7722] For validating signatures using translation:

[7723] The Ink Refill QA Device (A) and the translating QA Device (C)must share a common or a variant key i.e C.K_(n3)=A.K_(n2) orC.K_(n3)=FormKeyVariant(A.K_(n2), C.ChipID).

[7724] The Ink Refill QA Device being refilled (B) and the translatingQA Device (C) must share a common or a variant key i.e C.K_(n2)=B.K_(n1)or B.K_(n1)=FormKeyVariant(C.K_(n2), B.ChipID). TABLE 324 A basic refillusing translation command sequence Seq No Function CommandRandom-Read-Random-Translate-Random-XferAmount-Random-Translate-Random-WriteFieldsAuth -reads M0 and M1 of the Ink QA Device being refilled using thetranslating QA Device C, produce updated amount for FieldNumE andsequence data field by calling XferAmount on Ink Refill QA Device, andfinally writing the updated value to Ink QA Device using the translatingQA Device. 1 C.Random None R_(C) = R_(L) 2 B.Read KeyRef = n1, SigOnly =0, MSelect = 0x03(indicates _(M0) and _(M1)), KeyIdSelect = 0x00 noKeyIds required), WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all_(M0) words), WordSelectForDesiredM (for _(M1)) = 0xFFFF (Read all _(M1)words), R_(E) = R_(C) If ResultFlag = Pass then MWords =SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B) = R_(L), SIG_(B) = SIGout Refer toSection 15.3.1 3 A.Random None R_(A) = R_(L) 4 C.Translate InputKeyRef =n2, DataLength = MWords length in words as per Seq No 2 preformatted asper Section 17.1, Data = MWords as returned from Seq No 2 preformattedas per Section 17.1, RE = R_(B), SIGE = SIG_(B), OutputKeyRef = n3, RE2= R_(A) If ResultFlag = Pass then R_(C1) = R_(L2), SIG_(C) = SIGOutRefer to Section 17.3.1 5 C.Random None R_(L) = R_(C2) 6 A.XferAmountKeyRef = n2, _(M0)OfExternal = First 16 words of MWords, _(M1)OfExternal= Last 16 words of MWords, ChipId = ChipId of B, FieldNumL =ink-remaining field of the Ink Refill QA Device, FieldNumE =ink-remaining field of the Ink QA Device, XferValLength = length inwords of XferVal XferVal = Value to be transferred from Ink Refill QADevice to Ink QA Device being refilled, R_(E) = R_(C1), R_(E2) = R_(C2),SIG_(E) = SIG_(C) If ResultFlag = Pass then FieldSelectB1 = FieldSelect− Select bits for FieldNumE and sequence data field SEQ_1 and SEQ_2,FieldValB1 = FieldVal − New Value or FieldNumE (transferred fromFieldNumL of the Ink Refill QA Device) and sequence data fields SEQ_1and SEQ_2, R_(A1) = R_(L2), SIG_(A) = SIGout Refer to Section 27.1.3.1 7B.Random None R_(B1) = R_(L) 8 C.Translate InputKeyRef = n3, DataLength= FieldValB length in words as per Seq No 6 preformatted as per Section17.1, Data = FieldValB as returned from Seq No 6 preformatted as perSection 17.1, RE = R_(A1), SIGE = SIG_(A), OutputKeyRef = n2, RE2 =R_(B1) If ResultFlag = Pass then R_(C3) = RL2, SIG_(C) = SIGOut Refer toSection 17.3.1 9 B.WriteFieldsAuth KeyRef = n1, FieldSelect =FieldSelectB, FieldData = FieldValB, RE = R_(C3), SIGE = SIG_(C)ResultFlag = Pass/Fail

[7725] 31.8 Recovering From a Failed Refill

[7726] This sequence is performed if the refill failed (for e.g Ink QADevice didn't receive the refill message correctly and hence didn'trefill successfully). The Ink Refill QA Device therefore needs to berolled back to the previous value before the refill.

[7727] The Ink Refill QA Device checks that the Ink QA Device didn'tactually receive the message correctly using the StartRollBack function.The RollBackAmount performs further comparisons on sequence data fieldand FieldNumE of the Ink QA Device, to values stored in the XferEntrycache. After performing all checks, the Ink Refill QA Device adjusts itsink field to a previous value before the transfer request was processedby it. Refer to Section 26 and Section 28 for details.

[7728] The rollback is started using theRandom-Read-Random-StartRollBack-WriteFieldsAuth and the rollback of theInk Refill QA Device is performed using Random-Read-RollBackAmountsequence. TABLE 325 Rollback amount command sequence Seq No FunctionCommand Random-Read-Random-StartRollBack-WriteAuth starts the rollbackand updates data for the sequence data fields SEQ_1 and SEQ_2. 1A.Random None R_(A) = RL 2 B.Read KeyRef = n1, SigOnly = 0, MSelect =0x03(indicates _(M0) and _(M1)), KeyIdSelect = 0x00 no KeyIds required),WordSelectForDesiredM (for _(M0)) = 0xFFFF (Read all _(M0) words),WordSelectForDesiredM (for _(M1)) = 0xFFFF (Read all _(M1) words), R_(E)= R_(A) If ResultFlag = Pass then MWords = Selected WordsOfSelectedMs asper input [MSelect] and [WordSelectForDesiredM], R_(B) = RL, SIG_(B) =SIGout Refer to Section 15.3.1 3 B.Random None R_(B1) = R_(L) 4A.StartRollBack KeyRef = n2, _(M0)Of External = First 16 words ofMWords, _(M1)OfExternal Last 16 words of MWords, ChipId = ChipId of B,FieldNumL = ink-remaining field of the Ink Refill QA Device which willbe adjusted to the value before the failed refill, FieldNumE =ink-remaining field of the Ink QA Device which failed to refill, R_(E) =R_(B), R_(E2) = R_(B1) SIG_(E) = SIG_(B) If ResultFlag = Pass thenFieldSelectB = FieldSelect − Select bits for sequence data fields- SEQ_1and SEQ_2, FieldValB = FieldVal − New value for sequence data fieldsSEQ_1 and SEQ_2 R_(A1) = R_(L2), SIG_(A) = SIGout Refer to Section27.1.3.1. 5 B.WriteFieldsAuth KeyRef = n1, FieldSelect = FieldSelectB inSeq No 4, FieldData = FieldValB in Seq No 4 RE = R_(A1), SIGE = SIG_(A)10 ResultFlag = Pass/Fail Random-Read-RollBackAmount performs a read ofthe Ink QA Device, checks its values are as per Xfer Entry cache, andthen adjusts its ink-remaining field. 11 A.Random None R_(A2). = RL 12B.Read KeyRef = n1, SigOnly = 0, MSelect = 0x03(indicates _(M0) and_(M1)), KeyIdReq = 0 (not required), KeyIdSelect = 0x00 (no KeyIdsrequired), WordSelectForDesiredM (for _(M0)) = 0xFFFF(Read all _(M0)words), WordSelectForDesiredM (for _(M1)) = 0xFFFF (Read all _(M1)words), RE = R_(A2) If ResultFlag = Pass then MWords =SelectedWordsOfSelectedMs as per input [MSelect] and[WordSelectForDesiredM], R_(B2) = R_(L), SIG_(B) = SIGout Refer toSection 15.3.1 13 A.RollBackAmount KeyRef = n2, _(M0)OfExternal = First16 words of MWords, _(M1)OfExternal = Last 16 words of MWords, ChipId =ChipId of B, FieldNumL = ink-remaining field of Ink Refill QA Devicewhich will be adjusted to the value before the failed refill, FieldNumE= ink- remaining field of Ink QA Device which failed to refill, R_(E) =R_(B2), SIG_(E) = SIG_(B) ResultFlag = Pass/Fail

[7729] 31.9 Upgrading/Refilling/Filling the Upgrader

[7730] This sequence is performed when a count-remaining field in theParameter QA Device must be updated or when the ink-remaining field inthe Ink Refill QA Device requires re/filling.

[7731] In case of the Parameter QA Device, another Parameter UpgraderRefill QA Device transfers its count-remaining value to the Parameter QADevice using the transfer sequence described in Section 31.4. Also referto Section 28.6. This means the count-remaining in the ParamaterUpgrader Refill QA Device must be decremented by the same amount thatParameter Upgrader QA Device is incremented by i.e a credit transferoccurs.

[7732] In case of the Ink Refill QA Device, another Ink Refill QA Devicetransfers its ink-remaining value to the Ink Refill QA Device using thetransfer sequence described in Section 31.4. Also refer to Section 26.4.This means the logical ink-remaining in the Ink Refill QA Device must bedecremented by the same amount that QA Device being refilled isincremented by i.e a credit transfer occurs.

[7733] 32 Setting Up for Field Use

[7734] This section consists of setting up the data structures in the QADevice correctly for field use. All data structures are first programmedto factory values. Some of the data structures can then be changed toapplication specific values at the ComCo or the OEM, while others areset to fixed values.

[7735] 32.1 Instantiating the QA Chip Logical Interface

[7736] This sequence is performed when the QA Device is first created.Table 326 shows the data structure on final program load. TABLE 326 Datastructure set up during final program load Data Structure Fixed or NameValue Set to Updatable ChipId Unique Identifier for QA Device FixedNumKey Number of keys the QA Device Fixed can hold K_(n) All K_(n) =K_(batch). The K_(batch) is Updateable if unique for a productionbatch^(a). previous value is known KeyId All KeyIds = KeyId ofK_(batch). Updateable along with K_(n). KeyLock All KeyLock = unlockedUpdateable NumVectors Number of memory vectors in the QA Fixed Device._(M0) Set to zeros Updateable _(M0) Set to zeros Updateable M₂₊ Set tozeros Updateable P_(n) Set to ones Updateable R Set to an initial randomvalue Updateable

[7737] Each key slot has the same K_(batch). If each key slot had adifferent K_(batch), and any one of the K_(batch) was compromised thenthe entire batch would be compromised till the K_(batch) was replaced toanother key. Hence, each key slot having a different K_(batch) doesn'thave any security advantages but requires more keys to be managed.

[7738] 32.2 Setting Up Application Specific Data

[7739] The section defines the sequences for configuring the datastructures in the QA Device to application specific data.

[7740] 32.2.1 Replacing Keys

[7741] The QA Devices are programmed with production batch keys at finalprogram load. The COMCO keys replace the production batch keys beforethe QA Devices are shipped to the ComCo. The ComCo replaces the COMCOkeys to COMCO_OEM when shipping QA Devices to its OEMs. The OEM replacesthe COMCO_OEM to COMCO_OEM app as the QA Devices are placed in inkcartridges or printers.

[7742] The replacement occurs without the ComCo or the OEM knowing theactual value of the key. The actual value of the keys is only to knownto QACo. The ComCo or the OEM is able to perform these replacementsbecause the QACo provides them with a key programming QA Device withkeys appropriately set which can generate the necessary messages andsignatures to replace the old key with the new key.

[7743] Table 327 shows the command sequence for ReplaceKey. TheGetProgramKey gets the new encrypted key from the key programming QADevice, and the encrypted new key is passed into the QA Device whose keyis being replaced through the ReplaceKey function. Depending on theOldKeyRef and NewKeyRef objects a common encrypted key or a variantencrypted key can be produced for the ReplaceKey function TABLE 327ReplaceKey command sequence Seq No Function Command 1 B.Random NoneR_(B) = R_(L) 2 A.GetProgramKey OldKeyRef = Key Num of the old key. Thiskey must be changed to the NewKeyRef in the QA Device whose key s beingreplaced. ChipId = Chip identifier of the QA Device whose key is beingreplaced. RE = R_(B) KeyLock = Set depending on whether the new key isthe final key for the key slot or it will be replaced further. NewKeyRef= Key Num of the new key. This key will change the OldKeyRef in the QADevice whose key is being replaced. If ResultFlag = Pass then R_(A) =RL, KeyId_(new) = KeyIdOfNewKey EncryptedNewKey = EncryptedKey SIGA =SIGout Refer to Section 22.2.1. 3 B.ReplaceKey KeyNumToBeReplaced = Oldkey number, the old key could be a common key or a variant key, KeyId =KeyId_(new), EncryptedKey = EncryptedNewKey, RE = RA, SIGE = SIGAResultFlag = Pass/Fail

[7744] 32.2.2 Setting up ReadOnly Data

[7745] This sets the permanent functional parameters of the applicationwhere the QA Device has been placed. These parameters remain unchangedfor the lifetime of the QA Device. In case of the ink cartridge suchparameters are colour and viscosity of the ink. These values are writtento M₂₊ memory vectors using the WriteM1+ function, and its permissionsare set to ReadOnly by SetPerm function. These values are typically setat the OEM.

[7746] Table 328 shows the command sequence for setting up ReadOnlydata. TABLE 328 ReadOnly data setup command sequence Seq No FunctionCommand 1 B.WriteM1+ VectNum = 2 or 3, WordSelect = the selected wordsto be written, MVal = words corresponding to word select starting fromLSW ResultFlag = Pass/Fail 2 B.SetPerm (VectNum = same as Seq No 1parameter [VectNum], PermVal = same as Seq No 1 parameter [WordSelect])If ResultFlag = Pass then CurrPerm = NewPerm Current permission valueafter applying PermVal

[7747] In case of the SBR4320, the values written to M₂₊ memory vectorsis write-once only i.e they are set to ReadOnly as soon as they arewritten to once, therefore the command sequence consists only of Seq No1 in Table 329.

[7748] 32.2.3 Defining Fields in _(M0)

[7749] The QACo must determine the field definitions for M0 depending onthe application of the QA Device. These field definitions will consistof the following:

[7750] Number of fields and the size of each field.

[7751] The Type attribute of each field.

[7752] The access permission for each field.

[7753] Following fields have been presently defined in an ink QA Device:

[7754] ink-remaining field. See Section 26 for details.

[7755] Preauthorisation field. See Section 31.4.3 for details.

[7756] Sequence data fields SEQ_(—)1 and SEQ_(—)2. See Section 26 fordetails.

[7757] Following fields have been presently defined in a printer QADevice:

[7758] Operating parameter field. See Section 28 for details.

[7759] Sequence data fields SEQ_(—)1 and SEQ_(—)2. See Section 26 fordetails.

[7760] After the field definitions are determined, they are formatted asper Section 8.1.1.4. These formatted values are then written to _(M1)using a WriteM1+ function. TABLE 329 Defining M0 fields command sequenceSequence No Function Command 1 B.WriteM1+ VectNum = 1, WordSelect = Theselected words corresponding to the attribute field/fields of _(M0),MVal = words corresponding to word select starting from LSW) ResultFlag= Pass/Fail

[7761] 32.2.4 Writing Values to Fields in _(M0)

[7762] The writing of _(M0) fields for an Ink QA Device will typicallyoccur when the ink cartridge is filled with physical ink for the firsttime, and the equivalent logical ink is written to the Ink QA Device.Refer to Section 31.7 for details.

[7763] The writing of _(M0) fields for a Printer QA Device willtypically occur when the printer parameters are written for the firsttime. The procedure for writing of a printer parameter for the firsttime or upgrading a printer parameters is exactly the same. Refer toSection 31.5 for details.

[7764] Before any value is written to a field, the key slot containingthe key which has authenticated ReadWrite access to the field must belocked.

[7765] Both Ink QA Device and Printer QA Device has a sequence datafields SEQ_(—)1 and SEQ_(—)2 as described in Section 27. These twofields must be initialised to 1xFFFFFFFF, refer to Section 27 fordetails.

[7766] The Ink QA Device/Printer QA Device and the trusted QA Devicewriting to it, share the sequence key or a variant sequence key betweenthem i.e B.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2),B.ChipID), where B is the Ink QA Device/Printer QA Device and A is thetrusted QA Device. The command sequence used is described in Table 330.TABLE 330 Command sequence for writing sequence data fields to the QADevices. Sequence No Function Parameters 1 B.Random R_(B) = RL 2 A.SignMKeyRef = n2, FieldSelect = Select bit correponding to SEQ_1 and SEQ-2FieldVal = both fields set 0xFFFFFFFF. Refer to Section 31.4.3.3 ChipId= ChipId of B, R_(E) = R_(B) If ResultFlag = Pass then R_(A) = R_(L)SIG_(A) = SIGout Refer to Section 27.1.3.1 3 B.WriteFieldsAuth KeyRef =n1, FieldSelect = same as Seq 2[FieldSelect], FieldVal = same as Seq2[FieldVal], RE = R_(A), SIGE = SIG_(A) ResultFlag = Pass/Fail

[7767] 32.3 Setting Up the Upgrading QA Device

[7768] The upgrading QA Device must be set up either as an Ink Refill QADevice or as a Parameter Upgrader QA Device.

[7769] Each upgrading QA Device must go through the following set up:

[7770] The upgrading QA Device must be set to factory defaults. Refer toSection 32.1. At the end of this process the upgrading QA Device iseither an Ink Refill QA Device or a Parameter Upgrader QA Device withproduction batch keys and M0 fields set to deafult.

[7771] The upgrading QA Device must be programmed with the appropriatekeys and upgrade data before it can start upgrading other QA Devices.Following must be performed on each upgrade QA Device:

[7772] a. The upgrading QA Device must be programmed with theappropriate keys required to upgrade other QA Devices and to upgradeitself when necessary.

[7773] b. The M0 fields must be correctly defined and set in M1.

[7774] For a Ink Refill QA Device the ink-remaining field must bedefined and set. For a printer upgrade QA Device the upgrade value fieldand the count-remaining field must be defined and set. All upgrade QADevices must also have a sequence datat fields SEQ_(—)1 and SEQ_(—)2which are used to upgrade the upgrading QA Device itself.

[7775] c. Finally, M0 fields defined in b must be written withappropriate values so that the upgrade QA Device can perform upgrades.

[7776] An Ink Refill QA Device will typically store the logical inkequivalent to the physical ink in a refill station, hence the Ink RefillQA Device's ink-remaining field must be written with the equivalentlogical ink amount.

[7777] For a Parameter Upgrader QA Device the upgrade value field andthe count-remaining field must be written. The upgrade value depends onthe type of upgrade the Parameter Upgrader QA Device can perform i.e oneParameter Upgrader QA Device can upgrade to 10 ppm (pages per minute)while another Parameter Upgrader QA Device can upgrade to 5 ppm. Thecount-remaining is the number of times the Parameter Upgrader QA Deviceis permitted to write the associated upgrade value to other QA Devices.The count-remaining field must be written to a positive non-zero valuefor the Parameter Upgrader QA Device to perform successful upgrades.Refer to Section 32.3.1 and Section 32.3.2 for details.

[7778] 32.3.1 Setting Up the Ink Refill QA Device

[7779] 32.3.1.1 Setting Up the Keys

[7780] The Ink Refill QA DeviceQA Device could be transferring inkbetween peers or transferring ink down the heirachy, accordingly thepeer to peer Ink Refill QA Device has two keys (fill/refill key andsequence key) as described in Section 27, and a Ink Refill QA Devicetransferring down the heirachy has three keys (fill/refill key, transferkey and sequence key). These keys must be programmed into the Ink RefillQA Device using the sequence described in Section 32.2.1.

[7781] The Key Programming QA Device must be programmed with theappropriate production batch keys, and the fill/refill, transfer key andsequence key

[7782] The GetProgramKey function is called on the Key Programming QADevice with OldKeyRef (OldKeyRef—refer to Section 32.2.1) pointing to aproduction batch key, and the NewKeyRef (NewKeyRef—refer to Section32.2.1) pointing to either a fill/refill key or a transfer key or asequence key. The outputs from the GetProgramKey (signature andencrypted New Key) is passed in to ReplaceKey function of the Ink RefillQA Device.

[7783] The GetProgramKey function must be called (on the Key ProgrammingQA Device) for replacing each of the production batch keys in the InkRefill QA Device. The output of the GetProgramKey will be passed in tothe ReplaceKey function called on the Ink Refill QA Device. Thesuccessful processing of the ReplaceKey function will replace an oldkey(production keys) to a corresponding new key (either a fill/refillkey or a transfer key or a sequence key).

[7784] 32.3.1.2 Setting Up the M0 Field Information in _(M1)

[7785] The ink-remaining field and the sequence data fields SEQ_(—)1 andSEQ_(—)2 must be defined and set in the Ink Refill QA Device using thesequence described in Section 32.2.3.

[7786] 32.3.1.3 Transferring Ink Amounts

[7787] Finally, the logical ink amounts are transferred to theink-remaining field using the sequence described in Section 31.7.

[7788] The QACo will transfer to the ComCo Ink Refill QA Device at thetop of the heirachy using the command sequence in Table 331.

[7789] For a successful transfer from QACo to ComCo, ComCo and QACo mustshare a common key or a variant key be i.e ComCo.K_(n1)=QACo.K_(n2) orComCo.K_(n1)=FormKeyVariant(QACo.K_(n2), ComCo.ChipID)K_(n1) is thefill/refill key for the ComCo refill QA Device. TABLE 331 Commandsequence for writing ink-remaining amounts to the highest QA Device inthe heirachy. Sequence No Function Parameters 1 B.Random R_(B) = RL 2A.SignM KeyRef = n2, FieldSelect = Select bit correponding to theink-remaining field, FieldVal = Ink amount to be transferred, Refer toSection 31.4.3.3 ChipId = ChipId of B, R_(E) = R_(B) If ResultFlag =Pass then R_(A) = R_(L) SIG_(A) = SIGout Refer to Section 27.1.3.1 3B.WriteFieldsAuth KeyRef = n1, FieldSelect = same as Seq 2[FieldSelect],FieldVal = same as Seq 2[FieldVal], RE = R_(A), SIGE = SIG_(A)ResultFlag = Pass/Fail

[7790] 32.3.1.4 Setting Up Sequence Data Fields

[7791] The Ink Refill QA Device has sequence data fields SEQ_(—)1 andSEQ_(—)2 (as described in Section 27) because its ink-remaining fieldscan be refilled as well. These two fields must be initialised to1xFFFFFFFF, refer to Section 27 for details.

[7792] The Ink Refill QA Device and the trusted QA Device writing to it,share the sequence key or a variant sequence key between them i.eB.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID), whereB is the Ink Refill QA Device and A is the trusted QA Device. Thecommand sequence used is described in Table 331.

[7793] 32.3.2 Setting Up the Parameter Upgrader QA Device

[7794] 32.3.2.1 Setting Up the Keys

[7795] The Parameter Upgrader QA Device could be transferring upgradesbetween peers or transferring upgrades down the heirachy, accordinglythe peer to peer Parameter Upgrader QA Device has three keys(write-parameter key, fill/refill key and sequence key) as described inSection 28.6 and Section 26, and a Parameter Upgrader QA Devicetransferring down the heirachy has four keys (write-parameter key,fill/refill key, transfer key and sequence Key). These keys must beprogrammed into the Parameter Upgrader QA Device using the sequencedescribed in Section 32.2.1.

[7796] The Key Programming QA Device must be programmed with theappropriate production batch keys, and write-parameter key, fill/refillkey, transfer key and sequence key

[7797] The GetProgramKey function is called on the Key Programming QADevice with OldKeyRef (OldKeyRef—refer to Section 32.2.1) pointing to aproduction batch key, and the NewKeyRef (NewKeyRef—refer to Section32.2.1) pointing to either a write-parameter key, or a fill/refill key,or a transfer key, or a sequence key. The outputs from the GetProgramKey(signature and encrypted New Key) is passed in to ReplaceKey function ofthe Parameter Upgrader QA Device.

[7798] 32.3.2.2 Setting Up the M0 Field in _(M1)

[7799] The upgrade value field and the count-remaining field must bedefined and set in the upgrade QA Device using the sequence described inSection 32.2.3.

[7800] 32.3.2.3 Writing Upgrade Value to the Upgrade Field

[7801] The upgrade value is written to upgrade field using thewrite-parameter key. The upgrade QA Device and the trusted QA Devicewriting to it, share the write-parameter key or a variantwrite-parameter key between them i.e B.K_(n1)=A.K_(n2) orB.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID), where B is the upgrade QADevice and A is the trusted QA Device. The command sequence used isdescribed in Table 331.

[7802] 32.3.2.4 Transferring Count-Remaining Amounts

[7803] Finally, the logical count-remaining amounts are transferred tothe count-remaining field using the sequence described in Section 31.7.

[7804] The QACo will also transfer to the ComCo's upgrade QA Deviceusing the command sequence in Table 331.

[7805] For a successful transfer from QACo to ComCo, ComCo and QACo mustshare a common key or a variant key be i.e ComCo.K_(n1)=QACo.K_(n2) orComCo.K_(n1)=FormKeyVariant(QACo.K_(n2),ComCo.ChipID). K_(n1) is thefill/refill key for the ComCo upgrade QA Device.

[7806] 32.3.2.5 Setting Up Sequence Data Fields

[7807] The Parameter Upgrader QA Device has sequence data fieldsSEQ_(—)1 and SEQ_(—)2 (as described in Section 27) because itscount-remaining fields can be refilled as well. These two fields must beinitialised to 1xFFFFFFFF, refer to Section 27 for details.

[7808] The Parameter Upgrader QA Device and the trusted QA Devicewriting to it, share the sequence key or a variant sequence key betweenthem i.e B.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2),B.ChipID), where B is the Parameter Upgrader QA Device and A is thetrusted QA Device. The command sequence used is described in Table 331.

[7809] 32.4 Setting Up the Key Programmer

[7810] The key programming QA Device is set up to replace keys in otherQA Devices.

[7811] Each key programming QA Device must go through the following setup:

[7812] The key programming QA Device must be instantiated to factorydefaults. Refer to Section 32.1. At the end of instantiation the keyprogramming QA Device has production batch keys and no key replacementdata.

[7813] The key programming QA Device must be programmed with theappropriate keys and key replacement map before it can start to replacekeys in other QA Devices.

[7814] 32.4.1 Setting Up the Keys

[7815] The key programming QA Device must be programmed with the keyreplacement map key. The key replacement map key is described in detailsin Section 24.

[7816] The key programming QA Device must programmed with the old andnew keys for the QA Devices it is going to perform key replacement on.

[7817] Each of the keys is set in the key programming QA Device usingthe sequence described in Section 32.2.1.

[7818] 32.4.2 Setting Up Key Replacement Map Field Information

[7819] First the key replacement map field information is worked out asper Section 24.1. This field information is set in M1 as per thesequence described Section 32.2.3.

[7820] 32.4.3 Setting Up Key Replacement Map

[7821] Finally, the key replacement map field must be written with thevalid mapping using the key replacement map key. The key programming QADevice and the trusted QA Device writing to it must share the keyreplacement map key or a variant of the key replacement map key betweenthem.

[7822] For a successful write of the key replacement mapB.K_(n1)=A.K_(n2) or B.K_(n1)=FormKeyVariant(A.K_(n2), B.ChipID), whereB is the key replacement QA Device and A is the trusted QA Device. Thecommand sequence used is described in Table 331.

[7823] Appendix A: Field Types

[7824] Table 332 lists the field types that are specifically required bythe QA Chip Logical Interface and therefore apply across allapplications. Additional field types are application specific, and aredefined in the relevant application documentation. TABLE 332 PredefinedField Types Value Type Description 0x0000 0 Non-initialised (defaultvalue after final program load) 0x0001 TYPE_PREAUTH Defines a preauthfield in an Ink QA Device 0x0002 TYPE_COUNT_(—) Defines a countRemainingREMAINING field in an Parameter Upgrader QA Device 0x0003 TYPE_SEQ_1Defines a sequence data field SEQ_1 in an Ink QA Device or in a PrinterQA Device or in an upgrader QA Device 0x0004 TYPE_SEQ_2 Defines asequence data fields SEQ_2 in an Ink QA Device or in a Printer QA Deviceor in an upgrader QA Device 0x0005 TYPE_KEY_MAP Defines a keyreplacement map in a Key Programmer QA Device 0x0006 reserved reservedfor future use and above

[7825] Appendix B: Key and field definition for different QA Devices

[7826] B.1 Parameter Upgrader QA Device

[7827] B.1.1 Peer to Peer QA Device TABLE 333 Key definitions for a peerto peer Parameter Upgrader QA Device Key Name Purpose Fill/refill Thiskey has is used for upgrading Key count-remaining values when theupgrade QA Device is upgraded by another upgrade QA Device and is alsoused to decrement the count- remaining when upgrading other QA Devices.Sequence This key is used to initialise Key sequence data fields SEQ_1and SEQ_2 to 0xFFFFFFF. Write This key is used to write the Parameterupgrade value to the Parameter Key Upgrader QA Device.

[7828] TABLE 334 Field definitions for a peer to peer Parameter UpgraderQA Device Field Attrinutes Field A ^(a) NA^(b) EndPos Name Purpose TypeKeyNum RW RW KPerms^(c) (Size) CountRemaining The field storesTYPE_COUNT_REMAINING SN^(f) fill/ 1 0 KPerms[KN^(e)] = 1 Depends on thenumber of refill key Rest are 0 the maximum times the number ofParameter upgrades Upgrader QA that can Device is be stored. permittedto upgrade a printer QA Device. UpgradeValue This stores the Must definethe SN^(f) write- 1 0 KPerms[KN^(e)] = 0 Set as per value that is typeof the parameter Rest are 0 upgrade copied from the upgrade value i.ekey as well value. Parameter TYPE_PRINT_SPEED^(d) Upgrader QA Device tothe field being upgraded on the printer QA Device during the upgradeSEQ_1 This field holds TYPE_SEQ_1 SN^(f) 1 0 KPerms[KN^(e)] = 0Typically the data for sequence KPerms[fill/ 32 bit. sequence data keyrefill^(g)] = 1 field SEQ_1 Rest are 0 when the as well. ParameterUpgrader QA Device is being upgraded by another Parameter UpgraderRefill QA Device. SEQ_2 This field holds TYPE_SEQ_2 SN^(f) sequence 1 0KPerms[KN^(e)] = 0 Typically the data for key KPerms[fill/ 32 bit.sequence data refill^(g)] = 1 fieldsSEQ_2 Rest are 0 when the as well.Parameter Upgrader QA Device is being upgraded by another ParameterUpgrader Refill QA Device.

[7829] B.1.2 Heirarchical Transfer QA Device

[7830] Key Definitions TABLE 335 Key definitions for a ParameterUpgrader QA Device (transferring down the heirachy) Key Name PurposeTransfer This key is used to decrement Key the count-remaining whenupgrading other QA Devices. Fill/refill This key has is used for Keyupgrading count-remaining values when the Parameter Upgrader QA Deviceis upgraded by another Parameter Upgrader QA Device Refill QA Device.Sequence This key is used to Key initialise sequence data fields SEQ_1and SEQ_2 to 0xFFFFFFF. Write This key is used to write Parameter theupgrade value to the Key Parameter Upgrader QA Device.

[7831] Field Definitions TABLE 336 Field definitions for ParameterUpgrader QA Device transferring down the hierachy Field Attrinutes FieldA ^(a) NA^(b) EndPos Name Purpose Type KeyNum RW RW KPerms^(c) (Size)CountRemaining The field TYPE_COUNT_REMAINING SN^(f) fill/ 1 0KPerms[KN^(e)] = 0 Depends on the stores the refill key KPerms[Transfermaximum number number of Key] = 1 of upgrades times the Rest are 0 thatcan be Parameter stored. Upgrader QA Device is permitted to upgrade aprinter QA Device. UpgradeValue This stores Must define the SN^(f)write- 1 0 KeyPerms[KN^(e)] = 0 Set as per the value type of theparameter Rest are 0 upgrade value. that is copied upgrade value keyfrom the i.e Parameter TYPE_PRINT_SPEED^(d) Upgrader QA Device to thefield being upgraded on the printer QA Device during the upgrade SEQ_1This field TYPE_SEQ_1 SN^(f) sequence 1 0 KPerms[KN^(e)] = 0 Typicallyholds the key KPerms[fill/ 32 bit. data for refill^(g)] = 1 sequencedata Rest are 0 as fields SEQ_1 well. when the Parameter Upgrader QADevice is being up- graded by another Parameter Upgrader Refill QADevice. SEQ_2 This field TYPE_SEQ_2 SN^(f) sequence 1 0 KPerms[KN^(e)] =0 Typically holds the key KPerms[fill/ 32 bit. data for refill^(g)] = 1sequence data Rest are 0 as fields SEQ_2 well. when the ParameterUpgrader QA Device is being up- graded by another Parameter UpgraderRefill QA Device.

[7832] B.2 Ink Refill QA Device

[7833] B.2.1 Peer to Peer QA Device

[7834] Key Definitions TABLE 337 Key definitions for a peer to peer InkRefill QA Device Key Name Purpose Fill/refill This key has is used forfilling/ Key refilling ink-remaining values when the Ink Refill QADevice is upgraded by another Ink Refill QA Device and is also used todecrement from the ink-remaining when transferring ink to other QADevices (typically Ink QA Device). Sequence This key is used toinitialise Key sequence data fields SEQ_1 and SEQ_2 to 0xFFFFFFF.

[7835] Field Definitions TABLE 338 Field definitions for a peer to peerInk Refill QA Device Field Attrinutes Field Key A ^(a) NA^(b) EndPosName Purpose Type Num RW RW KeyPerms^(c) (Size) InkRemaining The fieldMust define the SN^(f) fill/ 1 1 KeyPerms[KN^(e)] = 1 Depends on thestores the type of Ink e.g refill key Rest are 0 maximum amount amountof TYPE_HIGHQUALITY_(—) of ink that logical ink- BLACK_INK^(d) can bestored remaining in and the storage the ink refill resolution i.e QADevice. in pico litres or in micro litres. SEQ_1 This field TYPE_SEQ_1SN^(f) sequence 1 0 KPerms[KN^(e)] = 0 Typically holds the keyKPerms[fill/ 32 bit. data for refill^(g)] = 1 sequence data Rest are 0as field SEQ_1 well. when the Ink Refill QA Device is being filled/refilled by another Ink Refill QA Device. SEQ_2 This field TYPE_SEQ_2SN^(f) sequence 1 0 KPerms[KN^(e)] = 0 Typically holds the data keyKPerms[fill/ 32 bit. for sequence refill^(g)] = 1 data field Rest are 0as SEQ_2 when the well. Ink Refill QA Device is being filled/refilled byanother Ink Refill QA Device.

[7836] B.2.2 Heirarchical Transfer QA Device

[7837] Key Definitions TABLE 339 Key definitions for a ink refill QADevice (transferring down the heirachy) Key Name Purpose Transfer Thiskey is used to decrement from the Key ink-remaining when transferringink to other QA Devices. Fill/refill This key has is used for filling/Key refilling ink-remaining values when the Ink Refill QA Device isupgraded by another Ink Refill QA Device. Sequence This key is used toinitialise sequence Key data fields SEQ_1 and SEQ_2 to 0xFFFFFFF.

[7838] Field Definitions TABLE 340 Field definitions for a Ink Refill QADevice (transferring down the heirachy) Field Attrinutes Field A ^(a)NA^(b) EndPos Name Purpose Type KeyNum RW RW KeyPerms^(c) (Size)InkRemaining The field Must define SN^(f) fill/ 1 0 KPerms[KN^(e)] = 0Depends on the stores the the type of refill key KPerms[Transfer maximumamount amount of Ink e.g- Key] = 1 of ink that logical ink-TYPE_HIGHQUALITY_(—) Rest are 0 can be stored remaining in BLACK_INK^(d)and the storage the Ink resolution i.e Refill QA in pico litres Device.or in micro litres. SEQ_1 This field TYPE_SEQ_1 SN^(f) sequence 1 0KPerms[KN^(e)] = 0 Typically holds the key KPerms[fill/ 32 bit. data forrefill^(g)] = 1 sequence Rest are 0. data field SEQ_1 when the InkRefill QA Device is being filled/ refilled by another Ink Refill QADevice. SEQ_2 This field TYPE_SEQ_2 SN^(f) sequence 1 0 KPerms[KN^(e)] =0 Typically holds the key KPerms[fill/ 32 bit. data for refill^(g)] = 1sequence Rest are 0. data field SEQ_2 when the Ink Refill QA Device isbeing filled/refilled by another Ink Refill QA Device.

[7839] B.3 Key Programming QA Device

[7840] B.3.1 Key Definitions TABLE 341 Key definitions for a KeyProgramming QA Device Key Name Purpose Key This key is used to write thekey replacement map. replacement map Key Old Keys These are the old keysof the QA Device whose keys will be replaced by the Key Programming QADevice. New Keys These are the new keys of the QA Device whose old keyswill be replaced by the Key Programming QA Device.

[7841] B.3.2 Field Definitions TABLE 342 Field definitions for a keyreplacement QA Device Field Attrinutes Field A ^(a) NA^(b) EndPos NamePurpose Type KeyNum RW RW KPerms^(c) (Size) Key This definesTYPE_KEY_MAP Key Replace- 1 0 KPerms[KN^(d)] = 0 2 words replacement themapping ment Map key Rest are 0 (64 bits) map between the old key andthe new key for the QA Device whose old key will be replaced by the newkey.

[7842] B.4 Ink QA Device

[7843] B.4.1 Key Definitions TABLE 343 Key definitions for a Ink QADevice Key Name Purpose Fill/refill Key This key is used forfill/refilling ink-remaining amount in the ink QA Device. Ink usage KeyThis key is verifying the data read from the ink QA Device and forwriting preauth data. Sequence Key This key is used to initialisesequence data fields SEQ_1 and SEQ_2 to 0×FFFFFFF.

[7844] B.4.2 Field Definitions TABLE 344 Field definitions for a Ink QADevice Field Attrinutes Field Key A ^(a) NA^(b) EndPos Name Purpose TypeNum RW RW KPerms^(c) (Size) InkRemaining The amount of Must defineSN^(f) fill/ 1 1 KPerms[KN^(e)] = 1 Depends on the logical ink- the typeof refill key Rest are 0 maximum amount remaining in Ink i.e of ink thatthe ink QA TYPE_HQ_(—) can be stored Device. More BLACK_INK^(d) and thestorage than one ink- resolution i.e remaining in pico litres field mayor in micro be present litres. depending on the number of physical inksstored in the ink cartridge. Preauth This field TYPE_PREAUTH SN^(f) ink0 1 KPerms[KN^(e)] = 0 Depends on defines the usage key Rest are 0preauth amount. preauth value. Typically 32 bits, may be 64 bits toaccomodate larger preauth amounts. SEQ_1 This field TYPE_SEQ_1 SN^(f) 10 KPerms[KN^(e)] = 0 Typically holds the sequence KPerms[fill/ 32 bit.data for key refill^(g)] = 1 sequence Rest are 0. data field SEQ_1 whenthe Ink QA Device is being filled/ refilled by a Ink Refill QA Device.SEQ_2 This field TYPE_SEQ_2 SN^(f) 1 0 KPerms[KN^(e)] = 0 Typicallyholds the sequence KPerms[fill/ 32 bit. data for key refill^(g)] = 1sequence Rest are 0. data field SEQ_2 when the Ink QA Device is beingfilled/ refilled by another Ink Refill QA Device.

[7845] B.5 Printer QA Device

[7846] B.5.1 Key Definition TABLE 345 Key definitions for a Printer QADevice Key Name Purpose Upgrade key This key is used forwriting/upgrading (fill/refill key) the functional parameter. Ink usageKey This key is verifying the data read from the Ink QA Device. SequenceKey This key is used to initialise sequence data fields SEQ_1 and SEQ_2to 0×FFFFFFF. PECID/SOPECID This key is used to verify the data Key readfrom the printer QA Device. This key is unique to each printer. Alsoused to translate data from the ink QA Device to the trusted printersystem QA Device.

[7847] B.5.2 Field Definition TABLE 346 Field definitions for a PrinterQA Device Field Attrinutes A ^(a) NA^(b) EndPos Field Name Purpose TypeKey Num RW RW KPerms^(c) (Size) Functional The field Must define SN^(f)fill/ 1 0 KPerms[KN^(e)] = 0 Set as per parameter stores an the type ofrefill key Rest are 0 functional upgradeable print speed parameter.functional i.e parameter. TYPE_PRINT_SPEED^(d) More than one functionalparameter can be stored in the printer QA Device. SEQ_1 This fieldTYPE_SEQ_1 SN^(f) 1 0 KPerms[KN^(e)] = 0 Typically holds the sequenceKPerms[fill/refill^(g)] = 1 32 bit. data for key Rest are 0. sequencedata field SEQ_1 when the Printer QA Device is being filled/ refilled bya Parameter Upgrader QA Device. SEQ_2 This field TYPE_SEQ_2 SN^(f) 1 0KPerms[KN^(e)] = 0 Typically holds the data sequenceKPerms[fill/refill^(g)] = 1 32 bit. for sequence key Rest are 0. datafield SEQ_2 when the Printer QA Device is being filled/ refilled byanother Parameter Upgrader QA Device.

[7848] B.6 Trusted Printer System QA Device

[7849] B.6.1 Key Definition TABLE 347 Key Name Purpose PECID/SOPECIDThis key is used to verify the data read from Key the printer QA Device.This key is unique to each printer. This key is also used for verifyingtranslated data from the ink QA Device.

[7850] Introduction

[7851] 1 Background

[7852] This document describes a QA Chip that can be used to holdcontains authentication keys together with circuitry specially designedto prevent copying. The chip is manufactured using a standard Flashmemory manufacturing process, and is low cost enough to be included inconsumables such as ink and toner cartridges. The implementation isapproximately 1 mm² in a 0.25 micron flash process, and has an expecteddie manufacturing cost of approximately 10 cents in 2003.

[7853] Once programmed, the QA Chips as described here are compliantwith the NSA export guidelines since they do not constitute a strongencryption device. They can therefore be practically manufactured in theUSA (and exported) or anywhere else in the world.

[7854] Note that although the QA Chip is designed for use inauthentication systems, it is microcoded, and can therefore beprogrammed for a variety of applications.

[7855] 2 Nomenclature

[7856] The following symbolic nomenclature is used throughout thisdocument: TABLE 348 Summary of symbolic nomenclature Symbol DescriptionF[X] Function F, taking a single parameter X F[X, Y] Function F, takingtwo parameters, X and Y X | Y X concatenated with Y X

y Bitwise X AND Y X

Y Bitwise X OR Y (inclusive-OR) X ⊕ Y Bitwise X XOR Y (exclusive-OR)

X Bitwise NOT X (complement) X

Y X is assigned the value Y X

{Y, Z} The domain of assignment inputs to X is Y and Z X = Y X is equalto Y X ≠ Y X is not equal to Y

X Decrement X by 1 (floor 0)

X Increment X by 1 (modulo register length) Erase X Erase Flash memoryregister X SetBits[X, Y] Set the bits of the Flash memory register Xbased on Y Z

ShiftRight[X, Y] Shift register X right one bit position, taking inputbit from Y and placing the output bit in Z

[7857] 3 Pseudocode

[7858] 3.1 Asynchronous

[7859] The following pseudocode:

[7860] var=expression

[7861] means the var signal or output is equal to the evaluation of theexpression.

[7862] 3.2 Synchronous

[7863] The following pseudocode:

[7864] var←expression

[7865] means the var register is assigned the result of evaluating theexpression during this cycle.

[7866] 3.3 Expression

[7867] Expressions are defined using the nomenclature in Table 348above. Therefore:

[7868] var=(a=b)

[7869] is interpreted as the var signal is 1 if a is equal to b, and 0otherwise.

[7870] 4 Diagrams

[7871] Black lines are used to denote data, while red lines are used todenote 1-bit control-signal lines.

[7872] Logical Interface

[7873] 5 Introduction

[7874] The QA Chip has a physical and a logical external interface. Thephysical interface defines how the QA Chip can be connected to aphysical System, while the logical interface determines how that Systemcan communicate with the QA ChIP. This section deals with the logicalinterface.

[7875] 5.1 Operating Modes

[7876] The QA Chip has four operating modes—Idle Mode, Program Mode,Trim Mode and Active Mode.

[7877] Active Mode is entered on power-on Reset when the fuse has beenblown, and whenever a specific authentication command arrives from theSystem. Program code is only executed in Active Mode. When the resetprogram code has finished, or the results of the command have beenreturned to the System, the chip enters Idle Mode to wait for the nextinstruction.

[7878] Idle Mode is used to allow the chip to wait for the nextinstruction from the System.

[7879] Trim Mode is used to determine the clock speed of the chip and totrim the frequency during the initial programming stage of the chip(when Flash memory is garbage). The clock frequency must be trimmed viaTrim Mode before Program Mode is used to store the program code.

[7880] Program Mode is used to load up the operating program code, andis required because the operating program code is stored in Flash memoryinstead of ROM (for security reasons).

[7881] Apart from while the QA Chip is executing Reset program code, itis always possible to interrupt the QA Chip and change from one mode toanother.

[7882] 5.1.1 Active Mode

[7883] Active Mode is entered in any of the following three situations:

[7884] power-on Reset when the fuse has been blown

[7885] receiving a command consisting of a global id write byte (0x00)followed by the ActiveMode command byte (0x06)

[7886] receiving a command consisting of a local id byte write followedby some number of bytes representing opcode and data.

[7887] In all cases, Active Mode causes execution of program codepreviously stored in the flash memory via Program Mode.

[7888] If Active Mode is entered by power-on Reset or the global idmechanism, the QA Chip executes specific reset startup code, typicallysetting up the local id and other IO specific data. The reset startupcode cannot be interrupted except by a power-down condition. Thepower-on reset startup mechanism cannot be used before the fuse has beenblown since the QA Chip cannot tell whether the flash memory is valid ornot. In this case the globalid mechanism must be used instead.

[7889] If Active Mode is entered by the local id mechanism, the QA Chipexecutes specific code depending on the following bytes, which functionas opcode plus data. The interpretation of the following bytes dependson whatever software happens to be stored in the QA ChIP.

[7890] 5.1.2 Idle Mode

[7891] The QA Chip starts up in Idle Mode when the fuse has not yet beenblown, and returns to Idle Mode after the completion of another mode.When the QA Chip is in Idle Mode, it waits for a command from the masterby watching the low speed serial line for an id that matches either theglobal id (0x00), or the chip's local id.

[7892] If the primary id matches the global id (0x00, common to all QAChips), and the following byte from the master is the Trim Mode id byte,and the fuse has not yet been blown, the QA Chip enters Trim Mode andstarts counting the number of internal clock cycles until the next byteis received. Trim Mode cannot be entered if the fuse has been blown.

[7893] If the primary id matches the global id (0x00, common to all QAChips), and the following byte from the master is the Program Mode idbyte, and the fuse has not yet been blown, the QA Chip enters ProgramMode. Program Mode cannot be entered if the fuse has been blown.

[7894] If the primary id matches the global id (0x00, common to all QAChips), and the following byte from the master is the Active Mode idbytes, the QA Chip enters Active Mode and executes startup code,allowing the chip to set itself into a state to subsequently receiveauthentication commands (includes setting a local id and a trim value).

[7895] If the primary id matches the chip's local ID, the QA Chip entersActive Mode, allowing the subsequent command to be executed.

[7896] The valid 8-bit serial mode values sent after a global id are asshown in Table 349: TABLE 349 Command byte values to place chip inspecific mode Value Interpretation 10101011 Trim Mode (0xAB) (onlyfunctions when the fuse has not been blown) 10001101 Program Mode (0xAD)(only functions when the fuse has not been blown) 00000110 Active Mode(0x06) (resets the chip & loads the localId)

[7897] 5.1.3 Trim Mode

[7898] Trim Mode is enabled by sending a global id byte (0x00) followedby the Trim Mode command byte (1xAB). Trim Mode can only be enteredwhile the fuse has not yet been blown.

[7899] The purpose of Trim Mode is to set the trim value (an internalregister setting) of the internal ring oscillator so that Flash erasuresand writes are of the correct duration. This is necessary due to the 2:1variation of the clock speed due to process variations. If writes anerasures are too long, the Flash memory will wear out faster thandesired, and in some cases can even be damaged. Note that the 2:1variation due to temperature still remains, so the effective operatingspeed of the chip is 7-14 MHz around a nominal 10 MHz.

[7900] Trim Mode works by measuring the number of system clock cyclesthat occur inside the chip from the receipt of the Trim Mode commandbyte until the receipt of a data byte. When the data byte is received,the data byte is copied to the trim register and the current value ofthe count is transmitted to the outside world.

[7901] Once the count has been transmitted, the QA Chip returns to IdleMode.

[7902] At reset, the internal trim register setting is set to a knownvalue r. The external user can now perform the following operations:

[7903] send the global id+write followed by the Trim Mode command byte

[7904] send the 8-bit value v over a specified time t

[7905] send a stop bit to signify no more data

[7906] send the global id+read followed by the Trim Mode command byte

[7907] receive the count c

[7908] send a stop bit to signify no more data

[7909] At the end of this procedure, the trim register will be v, andthe external user will know the relationship between external time t andinternal time c. Therefore a new value for v can be calculated.

[7910] The Trim Mode procedure can be repeated a number of times,varying both t and v in known ways, measuring the resultant c. At theend of the process, the final value for v is established (and stored inthe trim register for subsequent use in Program Mode). This value v mustalso be written to the flash for later use (every time the chip isplaced in Active Mode for the first time after power-up). For moreinformation about the internal workings of Trim Mode and the accuracy oftrim in the QA Chip, see Section 11.2 on page 967.

[7911] 5.1.4 Program Mode

[7912] Program Mode is enabled by sending a global id byte (0x00)followed by the Program Mode command byte.

[7913] If the QA Chip knows already that the fuse has been blown, itsimply does not enter Program Mode. If the QA Chip does not know thestate of the fuse, it determines whether or not the internal fuse hasbeen blown by reading 32-bit word 0 of the information block of flashmemory. If the fuse has been blown the remainder of data from theProgram Mode command is ignored, and the QA Chip returns to Idle Mode.

[7914] If the fuse is still intact, the chip enters Program Mode anderases the entire contents of Flash memory. The QA Chip then validatesthe erasure. If the erasure was successful, the QA Chip receives up to4096 bytes of data corresponding to the new program code and variabledata. The bytes are transferred in order byte₀ to byte₄₀₉₅.

[7915] Once all bytes of data have been loaded into Flash, the QA Chipreturns to Idle Mode.

[7916] Note that Trim Mode functionality must be performed before a chipenters Program Mode for the first time. Otherwise the erasure and writedurations could be incorrect.

[7917] Once the desired number of bytes have been downloaded in ProgramMode, the LSS Master must wait for 80 μs (the time taken to write twobytes to flash at nybble rates) before sending the new transaction (e.g.Active Mode). Otherwise the last nybbles may not be written to flash.

[7918] 5.1.5 After Manufacture

[7919] Directly after manufacture the flash memory will be invalid andthe fuse will not have been blown. Therefore power-on-reset will notcause Active Mode. Trim Mode must therefore be entered first, and onlyafter a suitable trim value is found, should Program Mode be entered tostore a program. Active Mode can be entered if the program is known tobe valid.

[7920] Logical View of CPU

[7921] 6 Introduction

[7922] The QA Chip is a 32-bit microprocessor with on-board RAM forscratch storage, on-board flash for program storage, a serial interface,and specific security enhancements.

[7923] The high level commands that a user of an QA Chip sees are allimplemented as small programs written in the CPU instruction set.

[7924] The following sections describe the memory model, the variousregisters, and the instruction set of the CPU.

[7925] 7 Memory Model

[7926] The QA Chip has its own internal memory, broken into thefollowing conceptual regions:

[7927] RAM variables (3 Kbits=96 entries at 32-bits wide), used forscratch storage (e.g. HMAC-SHA1processing).

[7928] Flash memory (8 Kbytes main block(+)128 bytes info block) used tohold the non-volatile authentication variables (including program keysetc), and program code. Only 4 KBytes(+)64 bytes is visible to theprogram addressing space due to shadowing. Shadowing is where half ofeach byte is used to validate and verify the other half, thus protectingagainst certain forms of physical and logical attacks. As a result, twobytes are read to obtain a single byte of data (this happenstransparently).

[7929] 7.1 RAM

[7930] The RAM region consists of 96×32-bit words required for thegeneral functioning of the QA Chip, but only during the operation of thechIP. RAM is volatile memory: once power is removed, the values arelost. Note that in actual fact memory retains its value for some periodof time after power-down, but cannot be considered to be available uponpower-up. This has issues for security that are addressed in othersections of this document.

[7931] RAM is typically used for temporary storage of variables duringchip operation. Short programs can also be stored and executed from theRAM.

[7932] RAM is addressed from 0 to 5F. Since RAM is in an unknown stateupon a RESET (RstL), program code should not assume the contents to be0. Program code can, however, set the RAM to be a particular known stateduring execution of the reset command (guaranteed to be received beforeany other commands).

[7933] 7.2 Flash Variables

[7934] The flash memory region contains the non-volatile information inthe QA ChIP. Flash memory retains its value after a RESET or if power isremoved, and can be expected to be unchanged when the power is nextturned on.

[7935] Byte 0 of main memory is the first byte of the program run forthe command dispatcher. Note that the command dispatcher is always runwith shadows enabled.

[7936] Bytes 0-7 of the information block flash memory is reserved asfollows:

[7937] byte 0-3=fuse. A value of 0x5555AAAA indicates that the fuse hasbeen blown (think of a physical fuse whose wire is no longer intact).

[7938] bytes 4-7=random number used to XOR all data for RAM and flashmemory accesses

[7939] After power-on reset (when the fuse is blown) or upon receipt ofa globalId Active command, the 32-bit data from bytes 4-7 in theinformation block of Flash memory is loaded into an internal ChipMaskregister. In Active Mode (the chip is executing program code), all dataread from the flash and RAM is XORed with the ChipMask register, and alldata written to the flash and RAM is XORed with the ChipMask registerbefore being written out. This XORing happens completely transparentlyto the program code. Main flash memory byte 0 onward is the start ofprogram code. Note that byte 0 onward needs to be valid after beingXORed with the appropriate bytes of ChipMask.

[7940] Even though CPU access is in 8-bit and 32-bit quantities, thedata is actually stored in flash a nybble-at-a-time. Each nybble writeis written as a byte containing 4 sets of b/

b pairs. Thus every byte write to flash is writing a nybble to real andshadow. A write mask allows the individual targetting ofnybble-at-a-time writes.

[7941] The checking of flash vs shadow flash is automatically carriedout each read (each byte contains both flash and shadow flash). If all 8bits are 1, the byte is considered to be in its erased form¹, andreturns 0 as the nybble. Otherwise, the value returned for the nybbledepends on the size of the overall access and the setting of bit 0 ofthe 8-bit WriteMask.

[7942] All 8-bit accesses (i.e. instruction and program code fetches)are checked to ensure that each byte read from flash is 4 sets of b/

b pairs. If the data is not of this form, the chip hangs until a newcommand is issued over the serial interface.

[7943] With 32-bit accesses (i.e. data used by program code), each byteread from flash is checked to ensure that it is 4 sets of b/

b pairs. A setting of WriteMask₀=0 means that if the data is not valID,then the chip will hang until a new command is issued over the serialinterface. A setting of WriteMask₀=1 means that each invalid nybble isreplaced by the upper nybble of the WriteMask. This allows recoveryafter a write or erasure is interrupted by a power-down.

[7944] 8 Registers

[7945] A number of registers are defined for use by the CPU. They areused for control, temporary storage, arithmetic functions, counting andindexing, and for I/O.

[7946] These registers do not need to be kept in non-volatile (Flash)memory. They can be read or written without the need for an erase cycle(unlike Flash memory). Temporary storage registers that contain secretinformation still need to be protected from physical attack by TamperPrevention and Detection circuitry and parity checks.

[7947] All registers are cleared to 0 on a RESET. However, program codeshould not assume any RAM contents have any particular state, and shouldset up register values appropriately. In particular, at the startupentry point, the various address registers need to be set up fromunknown states.

[7948] 8.1 GO

[7949] A 1-bit GO register is 1 when the program is executing, and 0when it is not. Programs can clear the GO register to halt execution ofprogram code once the command has finished executing.

[7950] 8.2 Accumulator and Z Flag

[7951] The Accumulator is a 32-bit general-purpose register that can bethought of as the single data register. It is used as one of the inputsto all arithmetic operations, and is the register used for transferringinformation between memory registers.

[7952] The Z register is a 1-bit flag, and is updated each time theAccumulator is written to. The Z register contains the zero-ness of theAccumulator. Z=1 if the last value written to the Accumulator was 0, and0 if the last value written was non-0.

[7953] Both the Accumulator and Z registers are directly accessible fromthe instruction set.

[7954] 8.3 Address Registers

[7955] 8.3.1 Program Counter Array and Stack Pointer

[7956] A 12-level deep 12-bit Program Counter Array (PCA) is defined. Itis indexed by a 4-bit Stack Pointer (SP). The current Program Counter(PC), containing the address of the currently executing instruction, iseffectively PCA[SP]. A single register bit, PCRamSel determines whetherthe program is executing from flash or RAM (0=flash, 1=RAM).

[7957] The PC is affected by calling subroutines or returning from them,and by executing branching instructions. The SP is affected by callingsubroutines or returning from them. There is no bounds checking oncalling too many subroutines: the oldest entry in the execution stackwill be lost.

[7958] The entry point for program code is defined to be address 0 inFlash. This entry point is used whenever the master signals a newtransaction.

[7959] 8.3.2 A0-A3

[7960] There are 4 8-bit address registers Each register has anassociated memory mode bit designating the address as in Flash (0) orRAM (1).

[7961] When an An register is pointing to an address in RAM, it holdsthe word number. When it is pointing to an address in Flash, it pointsto a set of 32-bit words that start at a 128-bit (16 byte) alignment.The A0 register has a special use of direct offset e.g. access ispossible to (A0),0-7 which is the 32-bit word pointed to by A0 offset bythe specified number of words.

[7962] 8.3.3 WriteMask

[7963] The WriteMask register is used to determine how many nybbles willbe written during a 32-bit write to Flash, and whether or not an invalidnybble will be replaced during a read from Flash.

[7964] During writes to flash, bit n (of 8) determines whether nybble nis written. The unit of writing is a nybble since half of each byte isused for shadow data. A setting of 0xFF means that all 32-bits will bewritten to flash (as 8 sets of nybble writes).

[7965] During 32-bit reads from flash (occurs as 8 reads), the value ofWriteMask₀ is used to determine whether a read of invalid data isreplaced by the upper nybble of WriteMask. If 0, a read of invalid datais not replaced, and the chip hangs until a new command is issued overthe serial interface. If 1, a read of invalid data is replaced by theupper nybble of the WriteMask.

[7966] Thus a WriteMask setting of 0 (reset setting) means that nowrites will occur to flash, and all reads are not replaced (causing theprogram to hang if an invalid value is encountered).

[7967] 8.4 Counters

[7968] A number of special purpose counters/index registers are defined:TABLE 350 Counter/Index registers Register Name Size Bits Description C11 × 3 3 Counter used to index arrays and general purpose counter C2 1 ×6 6 General purpose counter and can be used to index arrays

[7969] All these counter registers are directly accessible from theinstruction set. Special instructions exist to load them with specificvalues, and other instructions exist to decrement or increment them, orto branch depending on the whether or not the specific counter is zero.

[7970] There are also 2 special flags (not registers) associated with C1and C2, and these flags hold the zero-ness of C1 or C2. The flags areused for loop control, and are listed here, for although they are notregisters, they can be tested like registers. TABLE 351 Flags fortesting C1 and C2 Name Description C1Z 1 = C1 is current zero, 0 = C1 iscurrently non-zero. C2Z 1 = C2 is current zero, 0 = C2 is currentlynon-zero.

[7971] 8.5 RTMP

[7972] The single bit register RTMP allows the implementation of LFSRsand multiple precision shift registers.

[7973] During a rotate right (ROR) instruction with operand of RB, thebit shifted out (formally bit 0) is written to the RTMP register. Thebit currently in the RTMP register becomes the new bit 31 of theAccumulator. Performing multiple ROR RB commands over several 32-bitvalues implements a multiple precision rotate/shift right.

[7974] The XRB operand operates in the same way as RB, in that thecurrent value in the RTMP register becomes the new bit 31 of theAccumulator. However with the XRB instruction, the bit formally known asbit 0 does not simply replace RTMP (as in the RB instruction). Instead,it is XORed with RTMP, and the result stored in RTMP, thereby allowingthe implementation of long LFSRs.

[7975] 8.6 Registers Used for I/O

[7976] Several registers are defined for communication between themaster and the QA ChIP. These registers are LocalID, InByte and OutByte.

[7977] LocalId (7 bits) defines the chip-specific id that thisparticular QA Chip will accept commands for.

[7978] InByte (8 bits) provides the means for the QA Chip to obtain thenext byte from the master. OutByte (8 bits) provides the means for theQA Chip to send a byte of data to the master.

[7979] From the QA Chip's point of view:

[7980] Reads from InByte will hang until there is 1 byte of data presentfrom the master.

[7981] Writes to OutByte will hang if the master has not alreadyconsumed the last OutByte.

[7982] When the master begins a new command transaction, any existingdata in InByte and OutByte is lost, and the PC is reset to the entrypoint in the code, thus ensuring correct framing of data.

[7983] 8.7 Registers Used for Trimming Clock Speed

[7984] A single 8-bit Trim register is used to trim the ring oscillaorclock speed. The register has a known value of 0x00 during reset toensure that reads from flash will succeed at the fastest processcorners, and can be set in one of two ways:

[7985] via Trim Mode, which is necessary before the QA Chip isprogrammed for the first time; or

[7986] via the CPU, which is necessary every time the QA Chip is poweredup before any flash write or erasure accesses can be carried out.

[7987] 8.8 Registers Used for Testing Flash

[7988] There are a number of registers specifically for testing theflash implementation. A single 32-bit write to an appropriate RAMaddress allows the setting of any combination of these flash testregisters.

[7989] RAM consists of 96×32-bit words, and can be pointed to by any ofthe standard An address registers. A write to a RAM address in the range97-127 does nothing with the RAM (reads return 0), but a write to a RAMaddress in the range 0x80-0x87 will write to specific groupings ofregisters according to the low 3 bits of the RAM address. A 1 in theaddress bit means the appropriate part of the 32-bit Accumulator valuewill be written to the appropriate flash test registers. A 0 in theaddress bit means the register bits will be unaffected.

[7990] The registers and address bit groupings are listed in Table 352:TABLE 352 Flash test registers settable from CPU in RAM address range0x80-0x87² adr bitSuper- scriptpara- data numonly bits name description0  0 shadowsOff 0 = shadowing applies (nybble based flash access) 1 =shadowing disabled, 8-bit direct accesses to flash.  1 hiFlashAdr Onlyvalid when shadowsOff = 1 0 = accesses are to lower 4 Kbytes of flash 1= accesses are to upper 4 Kbytes of flash  2 1  3 enable- 0 = keep flashtest FlashTest register within the TSMC flash IP in its reset state 1 =enable flash test register to take on non- reset values.  8-4 flashTestInternal 5-bit flash test register within the TSMC flash IP(SFC008_08B9_HE). If this is written with 0x1E, then subsequent writeswill be according to the TSMC write test mode. You must write a non-0x1Evalue or reset the register to exit this mode. 2 28-9 flashTime WhentimerSel is 1, this value is used for the duration of the program cyclewithin a standard flash write or erasure. 1 unit = 16 clock cycles (16 ×100 ns typical). Regardless of timerSel, this value is also used for thetimeout following power down detection before the QA Chip resets itself.1 unit = 1 clock cycle (= 100 ns typical). Note that this means theprogrammer should set this to an appropriate value (e.g. 5 μs), just asthe localId needs to be set. 29 timerSel 0 = use internal (default)timings for flash writes & erasures 1 = use flashTime for flash writesand erasures

[7991] When none of the address register bits 0-2 are set (e.g. a writeto RAM address 0x80), then invalid writes will clear the illChip andretryCount registers.

[7992] For example, set the A0 register to be 0x80 in RAM. A write to(A0),0 will write to none of the flash test registers, but will clearthe illChip and retryCount registers. A write to (A0),7 will write toall of the flash test registers. A write to (A0),2 will write to theenableFlashTest and flashTest registers only. A write to (A0),4 willwrite to the flashTime and bmerSel registers etc.

[7993] Finally, a write to address 0x88 in RAM will cause a deviceerasure. If infoBlockSel is 0, then the device erasure will only be ofmain memory. If infoBlockSel is 1, then the device erasure is of bothmain memory and the information block (which will also clear theChipMask and the Fuse).

[7994] Reads of invalid RAM areas will reveal information as follows:

[7995] all invalid addresses in RAM (e.g. 0x80) will return the illChipflag in the low bit (illChip is set whenever 16 consecutive bad readsoccur for a single byte in memory)

[7996] all invalid addresses in RAM with the low address bit set (e.g.0x81, or (A0),1 when A0 holds 0x80), will additionally return the mostrecent retryCount setting (only updated by the chip when a bad readoccurs). i.e. bit 0=illChip, bits 4-1=retryCount.

[7997] 8.9 Register Summary

[7998] Table 353 provides a summary of the registers used in the CPU.TABLE 353 Register summary Register name Description #bits A[0-3]address registers 49 = 36 Acc Accumulator 32 C1 general purpose counterand index 3 C2 general purpose counter and index 6 IllChip gets setwhenever more than 15 1 consecutive bad reads from flash occurred (andany program executing has hung) InByte input byte from outside world 8Go determines whether CPU is executing 1 LocalId determines id for thischip's IO 7 OutByte output byte to outside world 8 Z zero flag for lastxfer to Acc 1 PCA program counter array 1212 = 144  PCRamSel Programcode is executing in flash 1 (0) or ram (1) RetryCount counts the numberof retries for 4 bad reads RTMP bit used to alow multi-word rotations 1SP stack pointer into PCA 4 Trim trims ring oscillator frequency 8 flashtest registers various registers in the embedded 30 flash and flashaccess logic specifically for testing the flash memory TOTAL (bits) 295

[7999] 8.10 Startup

[8000] Whenever the chip is powered up, or receives a ‘write’ commandover the serial interface, the PC and PCRamSel get set to 0 andexecution begins at 0 in Flash memory. The program (starting at 0) needsto determine how the program was started by reading the InByte register.

[8001] If the first byte read is 0xFF, the chip is being requested toperform software reset tasks. Execution of software reset can only beinterrupted by a power down. The reset tasks include setting up RAM tocontain known startup state information, setting up Trim and localIDregisters etc. The CPU signals that it is now ready to receive commandsfrom an external device by writing to the OutByte register.

[8002] An external Master is able to read the OutByte (and any furtheroutbytes that the CPU decides to send) if it so wishes by a read usingthe localId.

[8003] Otherwise the first byte read will be of the form where the leastsignificant bit is 0, and bits 7-1 contain the localId of the device asread over the serial interface. This byte is usually discarded since itnominally only has a value of differentiation against a software resetrequest. The second and subsequent bytes contain the data message of awrite using the localId. The CPU can prevent interruption duringexecution by writing 0 to the localId and then restoring the desiredlocalId at the later stage.

[8004] 9 Instruction Set

[8005] The CPU operates on 8-bit instructions and typically on 32-bitdata items. Each instruction typically consists of an opcode andoperand, although the number of bits allocated to opcode and operandvaries between instructions.

[8006] 9.1 Basic Opcodes (Summary)

[8007] The opcodes are summarized in Table 354: TABLE 354 Opcode bitpattern map Opcode Mnemonic Simple Description 0000xxxx JMP Jump0001xxxx JSR Jump subroutine 0010xxxx TBR Test and branch 0011xxxx DBRDecrement and branch 0100xxxx SC Set counter to a value 0101xxxx STStore Accumulator in specified location 0110000x - reserved 01100010 JPZJump to 0 01100011 JPI Jump indirect 011001xx - reserved 01101xxx -reserved 01110000 - reserved 01110001 ERA Erase page of flash memorypointed to by Accumulator 01110010 JSZ Jump to subroutine at at 001110011 JSI Jump subroutine indirect 01110100 RTS Return fromsubroutine 01110101 HALT Stop the CPU 0111011x - reserved 01111xxx LIALoad immediate value into address register 10000xxx AND Bitwise ANDAccumulator 10001xxx OR Bitwise OR Accumulator 1001xxxx XOR Exclusive-ORAccumulator 1010xxxx ADD Add a 32 bit value to the Accumulator 1011xxxxLD Load Accumulator 1100xxxx ROR Rotate Accumulator right 11010xxx ANDBitwise AND Accumulator⁵ 11011xxx OR Bitwise ORAccumulator^(Superscriptparanumonly) 11100xxx XOR Bitwise XORAccumulator^(Superscriptparanumonly) 11101xxx ADD Add a 32 bit value tothe Accumulator^(Superscriptparanumonly) 11110xxx LD LoadAccumulator^(Superscriptparanumonly) 11111xxx RIA Rotate Accumulatorinto address register

[8008] Table 355 is a summary of valid operands for each opcode. Thetable is ordered alphabetically by opcode mnemonic. The binary value foreach operand can be found in the subsequent sections. TABLE 355 Validoperands for opcodes Opcode Valid operands ADD immediate value (A0),offset (An), {C1, C2} [where n = 0-3] AND immediate value (A0), offsetDBR {C1, C2}, offset ERA HALT JMP address JPI JPZ JSI JSR address JSZLIA {Flash, Ram}, An [where n = 0-3], {immediate value} LD immediatevalue (A0), offset (An), {C1, C2} [where n = 0-3] OR immediate value(A0), offset RIA {Flash, Ram}, An [where n = 0-3] ROR {InByte, OutByte,WriteMask, ID, C1, C2, RB, XRB, 1, 3, 8, 24, 31} RTS SC {C1, C2},{immediate value} ST (A0), offset (An), {C1, C2} [where n = 0-3] TBR {0,1}, offset XOR immediate value (A0), offset (An), {C1, C2} [where n =0-3]

[8009] Additional pseduo-opcodes (for programming convenience) are asfollows:

[8010] DEC=ADD 0xFF..

[8011] INC=ADD 0x01

[8012] NOT=XOR 0xFF..

[8013] LDZ=LD0

[8014] SC {C1, C2}, Acc=ROR {C1, C2}

[8015] RD=ROR Inbyte

[8016] WR=ROR OutByte

[8017] LDMASK=ROR WriteMask

[8018] LDID=ROR Id

[8019] NOP=XOR 0

[8020] 9.2 Addressing Modes

[8021] The CPU supports a set of addressing modes as follows:

[8022] immediate

[8023] accumulator indirect

[8024] indirect fixed

[8025] indirect indexed

[8026] 9.2.1 Immediate

[8027] In this form of addressing, the operand itself supplies the32-bit data.

[8028] Immediate addressing relies on 3 bits of operand, plus anoptional 8 bits at PC+1 to determine an 8-bit base value. Bits 0 to 1 ofthe opcode byte determine whether the base value comes from the opcodebyte itself, or from PC+1, as shown in Table 356. TABLE 356 Selectionfor base value in immediate mode Opcode₁₋₀ Base value 00 00000000 0100000001 10 From PC + 1 (i.e. MIUData₇₋₀) 11 11111111

[8029] The base value is computed by using CMD₀ as bit 0, and copyingCMD₁ into the upper 7 bits.

[8030] The resultant 8 bit base value is then used as a 32-bit value,with 0s in the upper 24 bits, or the 8-bit value is replicated into theupper 32 bits. The selection is determined by bit 2 of the opcode byte,as follows: TABLE 357 Replicate bits selection Opcode₂ Data 0 Noreplication. Data has 0 in upper 24 bits and baseVal in lower 8 bits 1Replicated. Data is 32-bit value formed by replicating baseVal.

[8031] Opcodes that support immediate addressing are LD, ADD, XOR, AND,OR. The SC and LIA instructions are also immediate in that they storethe data with the opcode, but they are not in the same form as thatdescribed here. See the detail on the individual instructions for moreinformation. Single byte examples include:

[8032] LD 0

[8033] ADD 1

[8034] ADD 0xFF...# this subtracts 1 from the acc

[8035] XOR 0xFF...# this performs an effective logical NOT operation

[8036] Double byte examples include:

[8037] LD 0x05 # a constant

[8038] AND 0x0F # isolates the lower nybble

[8039] LD 0x36...# useful for HMAC processing

[8040] 9.2.2 Accumulator Indirect

[8041] In this form of addressing, the Accumulator holds the effectiveaddress.

[8042] Opcodes that support Accumulator indirect addressing are JPI, JSIand ERA. In the case of JPI and JSI, the Accumulator holds the addressto jump to. In the case of ERA, the Accumulator holds the address of thepage in flash memory to be erased.

[8043] Examples include:

[8044] JPI

[8045] JSI

[8046] ERA

[8047] 9.2.3 Indirect Fixed

[8048] In this form of addressing, address register A0 is used as a baseaddress, and then a specific fixed offset is added to the base addressto give the effective address.

[8049] Bits 2-0 of the opcode byte specify the fixed offset from A0,which means the fixed offset has a range of 0 to 7.

[8050] Opcodes that support indirect indexed addressing are LD, ST, ADD,XOR, AND, OR.

[8051] Examples include:

[8052] LD (A0),2

[8053] ADD (A0), 3

[8054] AND (A0), 4

[8055] ST (A0), 7

[8056] 9.2.4 Indirect Indexed

[8057] In this form of addressing, an address register is used as a baseaddress, and then an index register is used to offset from that baseaddress to give the effective address.

[8058] The address register is one of 4, and is selected via bits 2-1 ofthe opcode byte as follows: TABLE 358 Address register selection addressregister Opcode₂₋₁ selected 00 A0 01 A1 10 A2 11 A3

[8059] Bit 0 of the opcode byte selects whether index register C1 or C2is used:

[8060] The counter is selected as follows: TABLE 359 Interpretation ofcounter for DBR Opcode₀ interpretion 0 C1 1 C2

[8061] Opcodes that support indirect indexed addressing are LD, ST, ADD,XOR.

[8062] Examples include:

[8063] LD (A2), C1

[8064] ADD (A1), C1

[8065] ST (A3), C2

[8066] Since C1 and C2 can only decement, processing of data structurestypically works by loading Cn with some number n and decrementing to 0.Thus (Ax),n is the first word accessed, and (Ax),0 is the last 32-bitword accessed in the loop.

[8067] 9.3 ADD—Add to Accumulator

[8068] Mnemonic: ADD

[8069] Opcode: 1010xxxx, and 11101xxx

[8070] Usage: ADD effective-address, or ADD immediate-value

[8071] The ADD instruction adds the specified 32-bit value to theAccumulator via modulo 2³² addition.

[8072] The 11101xxx form of the opcode follows the immediate addressingrules (see Section 9.2.1 on page 946). The 1010xxxx form of the opcodedefines an effective address as follows: TABLE 360 Interpretation ofoperand for ADD (1010xxxx) bit 3 interpretion comment 0 (A0), offsetindirect fixed addressing (see Section 9.2.3 on page 948) 1 (An), Cnindirect indexed addressing (see Section 9.2.4 on page 948)

[8073] The Z flag is also set during this operation, depending onwhether the result (loaded into the Accumulator) is zero or not.

[8074] 9.4 AND—Bitwise AND

[8075] Mnemonic: AND

[8076] Opcode: 10000xxx, and 11010xxx

[8077] Usage: AND effective-address, or AND immediate-value

[8078] The AND instruction performs a 32-bit bitwise AND operation onthe Accumulator.

[8079] The 11010xxx form of the opcode follows the immediate addressingrules (see Section 9.2.1 on page 946). The 10000xxx form of the opcodefollows the indirect fixed addressing rules (see Section 9.2.3 on page948).

[8080] The Z flag is also set during this operation, depending onwhether the resultant 32-bit value (loaded into the Accumulator) is zeroor not.

[8081] 9.5 DBR—Decrement and Branch

[8082] Mnemonic: DBR

[8083] OpCode: 0011xxxx

[8084] Usage: DBR Counter, Offset

[8085] This instruction provides the mechanism for building simpleloops.

[8086] The counter is selected from bit 0 of the opcode byte as follows:TABLE 361 Interpretation of counter for DBR bit 0 interpretion 0 C1 1 C2

[8087] If the specified counter is non-zero, then the counter isdecremented and the designated offset is added to the currentinstruction address (PC for 1-byte instructions, PC+1 for 2-byteinstructions). If the specified counter is zero, it is decremented (allbits in the counter become set) and processing continues at the nextinstruction (PC+1 or PC+2). The designated offset will typically benegative for use in loops.

[8088] The instruction is either 1 or two bytes, as determined by bits3-1 of the opcode byte:

[8089] If bits 3-1=000, the instruction consumes 2 bytes. The 8 bits atPC+1 are treated as a signed number and used as the offset amount. Thus0xFF is treated as −1, and 0x01 is treated as +1.

[8090] If bits 3-1≠000, the instruction consumes 1 byte. Bits 3-1 aretreated as a negative number (the sign bit is implied) and used as theoffset amount. Thus 111 is treated as −1, and 001 is treated as −7. Thisis useful for small loops.

[8091] The effect is that if the branch is back 1-7 bytes (1 byte is notparticularly useful), then the single byte form of the instruction canbe used. If the branch is forward, or backward more than 7 bytes, thenthe 2-byte instruction is required.

[8092] 9.6 ERA—Erase

[8093] Mnemonic: ERA

[8094] Opcode: 01110001

[8095] Usage: ERA

[8096] This instruction causes an erasure of the 256-byte page of flashmemory pointed to by the Accumulator. The Accumulator is assumed tocontain an 8-bit pointer to a 128-bit (16 byte) aligned structure (samestructure as the address registers). The page number to be erased comesfrom bits 7-4, and the lower 4 bits are ignored.

[8097] Note that the size of the flash memory page being erased isactually 512 bytes, but in terms of data storage and addressing from thepoint of view of the CPU, there is only 256 bytes in the page.

[8098] 9.7 HALT—Halt CPU Operation

[8099] Mnemonic: HALT

[8100] Opcode: 01110101

[8101] Usage: HALT

[8102] The HALT instruction writes a 0 to the internal GO register,thereby causing the CPU to terminate the currently executing program.The CPU will only be restarted with a new localId transaction from theMaster or by a globalId plus Active Mode byte.

[8103] 9.8 JMP—Jump

[8104] Mnemonic: JMP

[8105] Opcode: 0000xxxx

[8106] Usage: JMP effective-address

[8107] The JMP instruction provides for a method of branching to aspecified address. The instruction loads the PC with the effectiveaddress.

[8108] The new PC is loaded as follows: bits 11-8 are obtained from bits3-0 of the JMP opcode byte, and bits 7-0 are obtained from PC+1.

[8109] 9.9 JPI—Jump Indirect

[8110] Mnemonic: JPI

[8111] Opcode: 01100011

[8112] Usage: JPI

[8113] The JPI instruction loads the PC with the lower 12 bits of theAccumulator, and sets the PCRamSel register with bit 15 of theAccumulator. Note that the stack is unaffected (unlike JSI).

[8114] 9.10 JPZ—Jump to Zero

[8115] Mnemonic: JPZ

[8116] Opcode: 01100010

[8117] Usage: JPZ

[8118] The JPZ instruction loads the PC and PCRamSel with 0, therebycausing a jump to address 0 in Flash memory.

[8119] Programmers will not typically use the JPZ command. However theCPU executes this instruction whenever a new command arrives over theserial interface, so that the code entry point is known i.e. every timethe chip receives a new command, execution begins at address 0 in flash.This does not change the status of any other internal register settings(e.g. the flash test registers).

[8120] 9.11 JSI—Jump Subroutine Indirect

[8121] Mnemonic: JSI

[8122] Opcode: 01110011

[8123] Usage: JSI

[8124] The JSI instruction allows the jumping to a subroutine whoseaddress is obtained from the Accumulator. The instruction pushes thecurrent PC onto the stack, loads the PC with the lower 12 bits of theAccumulator, and sets the PCRamSel register with bit 15 of theAccumulator.

[8125] The stack provides for 12 levels of execution (11 subroutinesdeep). It is the responsibility of the programmer to ensure that thisdepth is not exceeded or the deepest return value will be overwritten(since the stack wraps). Programs can take advantage of the fact thatthe stack wraps.

[8126] 9.12 JSR—Jump Subroutine

[8127] Mnemonic: JSR

[8128] Opcode: 0001xxxx

[8129] Usage: JSR effective-address

[8130] The JSR instruction provides for the most common usage of thesubroutine construct. The instruction pushes the current PC onto thestack, and loads the PC with the effective address.

[8131] The new PC is loaded as follows: bits 11-8 are obtained from bits3-0 of the JSR opcode byte, and bits 7-0 are obtained from PC+1.

[8132] The stack provides for 12 levels of execution (11 subroutinesdeep). It is the responsibility of the programmer to ensure that thisdepth is not exceeded or the return value will be overwritten (since thestack wraps). Programs can take advantage of the fact that the stackwraps.

[8133] 9.13 JSZ—Jump to Subroutine at Zero

[8134] Mnemonic: JSZ

[8135] Opcode: 01110010

[8136] Usage: JSZ

[8137] The JSZ instruction jumps to the subroutine at flash address 0(i.e. it pushes the current PC onto the stack, and loads the PC andPCRamSel with 0).

[8138] Programmers will not typically use the JSZ command. It existsmerely as a result of opcode decoding minimization and can be used toassist with the testing of the chIP.

[8139] 9.14 LD—Load Accumulator

[8140] Mnemonic: LD

[8141] Opcode: 1011xxxx, and 11110xxx

[8142] Usage: LD effective-address, or LD immediate-value

[8143] The LD instruction loads the Accumulator with the 32-bit value.

[8144] The 11110xxx form of the opcode follows the immediate addressingrules (see Section 9.2.1 on page 946). The 1011xxxx form of the opcodedefines an effective address as follows: TABLE 362 Interpretation ofoperand for LD (1011xxxx) bit 3 interpretion comment 0 (A0), offsetindirect fixed addressing (see Section 9.2.3 on page 948) 1 (An), Cnindirect indexed addressing (see Section 9.2.4 on page 948)

[8145] The Z flag is also set during this operation, depending onwhether the value loaded into the Accumulator is zero or not.

[8146] 9.15 LIA—Load Immediate Address

[8147] Mnemonic: LIA

[8148] Opcode: 01111xxx

[8149] Usage: LIAF AddressRegister, Value # for flash addresses

[8150] The LIA instruction transfers the data from PC+1 into thedesignated address register (A0-A3), and sets the memory mode bit forthat address register.

[8151] Bit 0 specifies whether the address is in flash or ram, asfollows: TABLE 363 Interpretation of memory mode for LIA bit 0interpretion 0 Flash 1 Ram

[8152] 9.16 OR—Bitwise OR

[8153] Mnemonic: OR

[8154] Opcode: 10001xxx, and 11011xxx

[8155] Usage: OR effective-address, or OR immediate-value

[8156] The OR instruction performs a 32-bit bitwise OR operation on theAccumulator.

[8157] The 11011xxx form of the opcode follows the immediate addressingrules (see Section 9.2.1 on page 946). The 10001xxx form of the opcodefollows the indirect fixed addressing rules (see Section 9.2.3 on page948).

[8158] The Z flag is also set during this operation, depending onwhether the resultant 32-bit value (loaded into the Accumulator) is zeroor not.

[8159] 9.17 RIA—Rotate in Address

[8160] Mnemonic: RIA

[8161] Opcode: 11111xxx

[8162] Usage: RIAF AddressRegister # for flash addresses

[8163] RIAR AddressRegister # for ram addresses

[8164] The RIA instruction transfers the lower 8 bits of the Accumulatorinto the designated address register (A0-A3), sets the memory mode bitfor that address register, and rotates the Accumulator right by 8 bits.

[8165] Bit 0 specifies whether the address is in flash or ram, asfollows: TABLE 364 Interpretation of memory mode for RIA bit 0interpretion 0 Flash 1 Ram

[8166] The address register to be targetted is selected via bits 2-1 ofthe instruction.

[8167] 9.18 ROR—Rotate Right

[8168] Mnemonic: ROR

[8169] Opcode: 1100xxxx

[8170] Usage: ROR Value

[8171] The ROR instruction provides a way of rotating the Accumulatorright a set number of bits. The bit(s) coming in at the top of theAccumulator (to become bit 31) can either come from the previous lowerbits of the Accumulator, from the serial connection, or from externalflags. The bit(s) rotated out can also be output from the serialconnection, or combined with an external flag.

[8172] The allowed operands are as follows: TABLE 365 Interpretation ofoperand for ROR bits 3-0 interpretion 0000 RB 0001 XRB 0010 WriteMask0011 1 0100 - (reserved) 0101 3 0110 31 0111 24 1000 C1 1001 C2 1010 -(reserved) 1011 - (reserved) 1100 8 1101 ID 1110 InByte 1111 OutByte

[8173] The Z flag is also set during this operation, depending onwhether resultant 32-bit value (loaded into the Accumulator) is zero ornot.

[8174] In its simplest form, the operand for the ROR instruction is oneof 1, 3, 8, 24, 31, indicating how many bit positions the Accumulatorshould be rotated. For these operands, there is no external input oroutput—the bits of the Accumulator are merely rotated right. Note thatthese values are the equivalent to rotating left 31, 29, 24, 8, 1 bitpositions.

[8175] With operand WriteMask, the lower 8 bits of the Accumulator aretransferred to the WriteMask register, and the Accumulator is rotatedright by 1 bit. This conveniently allows successive nybbles to be maskedduring Flash writes if the Accumulator has been preloaded with anappropriate value (eg 1x01).

[8176] With operands C1 and C2, the lower appropriate number of bits ofthe Accumulator (3 for C1, 6 for C2) are transferred to the C1 or C2register and the lower 6 bits of the Accumulator are loaded with theprevious value of the Cn register. The remaining upper bits of theAccumulator are set as follows: bit 31-24 are copied from previous bits7-0, and bits 23-6 are copied from previous bits 31-14 (effectivelyjunk). As a result, the Accumulator should be subsequently masked if theprogrammer wants to compare for specific values).

[8177] With operand ID, the 7 low-order bits are transferred from theAccumulator to the LocalId register, the low-order 8 bits of theAccumulator are copied to the Trim register if the Trim register has notalready been written to after power-on reset, and the Accumulator isrotated right by 8 bits. This means that the ROR ID instruction needs tobe performed twice, typically during Global Active Mode—once to setTrim, and once to set LocalId. Note there is no way to read the contentsof the localId or Trim registers directly. However the LocalId sent tothe program for a command is available as bits 7-1 of the first byteobtained from InByte after program startup.

[8178] With operand InByte, the next serial input byte is transferred tothe highest 8 bits of the Accumulator. The InByteValid bit is alsocleared. If there is no input byte available from the client yet,execution is suspended until there is one. The remainder of theAccumulator is shifted right 8 bit positions (bit 31 becomes bit 23etc.), with lowest bits of the Accumulator shifted out.

[8179] With operand OutByte, the Accumulator is shifted right 8 bitpositions. The byte shifted out from bits 7-0 is stored in the OutByteregister and the OutByteValid flag is set. It is therefore ready for aclient to read. If the OutByteValid flag is already set, execution ofthe instruction stalls until the OutByteValid flag cleared (when theOutByte byte has been read by the client). The new data shifted in tothe upper 8 bits of the Accumulator is what was transferred to theOutByte register (i.e. from the Accumulator). Finally, the RB and XRBoperands allow the implementation of LFSRs and multiple precision shiftregisters. With RB, the bit shifted out (formally bit 0) is written tothe RTMP register. The register currently in the RTMP register becomesthe new bit 31 of the Accumulator. Performing multiple ROR RB commandsover several 32-bit values implements a multiple precision rotate/shiftright. The XRB operates in the same way as RB, in that the current valuein the RTMP register becomes the new bit 31 of the Accumulator. Howeverwith the XRB instruction, the bit formally known as bit 0 does notsimply replace RTMP (as in the RB instruction). Instead, it is XORedwith RTMP, and the result stored in RTMP. This allows the implementationof long LFSR_(S), as required by the authentication protocol.

[8180] 9.19 RTS—Return From Subroutine

[8181] Mnemonic: RTS

[8182] Opcode: 01110100

[8183] Usage: RTS

[8184] The RTS instruction pulls the saved PC from the stack, adds 1,and resumes execution at the resultant address. The effect is to causeexecution to resume at the instruction after the most recently executedJSR or JSI instruction.

[8185] Although 12 levels of execution are provided for (11subroutines), it is the responsibility of the programmer to balance eachJSR and JSI instruction with an RTS. A RTS executed with no previous JSRwill cause execution to begin at whatever address happens to be pulledfrom the stack. Of course this may be desired behaviour in specificcircumstances.

[8186] 9.20 SC—Set Counter

[8187] Mnemonic: SC

[8188] Opcode: 0100xxxx

[8189] Usage: SC Counter Value

[8190] The SC instruction is used to transfer a 3-bit Value into thespecified counter. The operand determines which of counters C1 and C2 isto be loaded as well as the value to be loaded. Value is stored in bits3-1 of the 8-bit opcode, and the counter is specified by bit 0 asfollows: TABLE 366 Interpretation of counter for SC bit 0 interpretion 0C1 1 C2

[8191] Since counter C1 is 3 bits, Value is copied directly into C1.

[8192] For counter C2, C2₂₋₀ are copied to C2₅₋₃, and Value is copied toC2₂₋₀. Two SC C2 instructions are therefore required to load C2 with agiven 6-bit value. For example, to load C2 with 0x0C, we would have SCC2 1 followed by SC C2 4.

[8193] 9.21 ST—Store Accumulator

[8194] Mnemonic: ST

[8195] Opcode: 0101xxxx

[8196] Usage: ST effective-address

[8197] The ST instruction stores the 32-bit Accumulator at the effectiveaddress. The effective address is determined as follows: TABLE 367Interpretation of operand for ST (0101xxxx) bit 3 interpretion comment 0(A0), offset indirect fixed addressing (see Section 9.2.3 on page 948) 1(An), Cn indirect indexed addressing (see Section 9.2.4 on page 948)

[8198] If the effective address in Flash memory, only those nybbleswhose corresponding WriteMask bit is set will be written to Flash.Programmers should be very aware of flash characteristics (write time,longevity, page size etc. when storing data in flash).

[8199] There is always the possibility that power could be removedduring a write to Flash. If this occurs, the flash will be in anindeterminate state. If the QA Chip is warned by the external systemthat power is about to be removed (via the master causing a transitionto Idle Mode), the write will be aborted cleanly at the nearest nybbleboundary (writes occur in the order of least significant to mostsignificant).

[8200] 9.22 TBR—Test and Branch

[8201] Mnemonic: TBR

[8202] Opcode: 0010xxxx

[8203] Usage: TBR Value Offset

[8204] The Test and Branch instruction tests the status of the Z flag(the zero-ness of the Accumulator), and then branches if a match ocurs.

[8205] The zero-ness is selected from bit 0 of the opcode byte asfollows: TABLE 368 Interpretation of zero-ness for TBR bit 0interpretion 0 true if Acc is zero (Z = 1) 1 true if Acc is non-zero (Z= 0)

[8206] If the specified zero-test matches, then the designated offset isadded to the current instruction address (PC for 1-byte instructions,PC+1 for 2-byte instructions). If the zero-test does not match,processing continues at the next instruction (PC+1 or PC+2). Theinstruction is either 1 or two bytes, as determined by bits 3-1 of theopcode byte:

[8207] If bits 3-1=000, the instruction consumes 2 bytes. The 8 bits atPC+1 are treated as a signed number and used as the offset amount to beadded to PC+1. Thus 0xFF is treated as −1, and 0x01 is treated as +1.

[8208] If bits 3-1≠000, the instruction consumes 1 byte. Bits 3-1 aretreated as a positive number (the sign bit is implied) and used as theoffset amount to be added to PC. Thus 111 is treated as 7, and 001 istreated as 1. This is useful for skipping over a small number ofinstructions.

[8209] The effect is that if the branch is forward 1-7 bytes (1 byte isnot particularly useful), then the single byte form of the instructioncan be used. If the branch is backward, or forward more than 7 bytes,then the 2-byte instruction is required.

[8210] 9.23 XOR—Bitwise Exclusive OR

[8211] Mnemonic: XOR

[8212] Opcode: 1001xxxx, and 11100xxx

[8213] Usage: XOR effective-address, or XOR immediate-value

[8214] The XOR instruction performs a 32-bit bitwise XOR operation onthe Accumulator.

[8215] The 11100xxx form of the opcode follows the immediate addressingrules (see Section 9.2.1 on page 946). The 1001xxxx form of the opcodehas an effective address as follows: TABLE 369 Interpretation of operandfor XOR (1001xxxx) bit 3 interpretion comment 0 (A0), offset indirectfixed addressing (see Section 9.2.3 on page 948) 1 (An), Cn indirectindexed addressing (see Section 9.2.4 on page 948)

[8216] The Z flag is also set during this operation, depending onwhether the result (loaded into the Accumulator) is zero or not.

[8217] Implementation

[8218] 10 Introduction

[8219] This chapter provides the high-level definition of a CPU capableof implementing the functionality required of an QA ChIP.

[8220] 10.1 Physical Interface

[8221] 10.1.1 Pin Connections

[8222] The pin connections are described in Table 370. TABLE 370 Pinconnections to QA Chip pin direction description Vdd In Nominal voltage.If the voltage deviates from this by more than a fixed amount, the chipwill RESET. GND In SCIk In Serial clock SDa In/Out Serial data

[8223] The system operating clock SysClk is different to SClk. SysClk isderived from an internal ring oscillator based on the processtechnology. In the FPGA implementation SysClk is obtained via a 5th pin.

[8224] 10.1.2 Size and Cost

[8225] The QA Chip uses a 0.25 μm CMOS Flash process for an area of 1mm² yielding a 10 cent manufacturing cost in 2002. A breakdown of areais listed in Table 371. TABLE 371 Breakdown of Area for QA Chipapproximate area (mm²) description 0.49 8 KByte flash memory TSMC:SFC0008_08B9_HE (8K × 8-bits, erase page size = 512 bytes) Area =724.688 μm × 682.05 μm. 0.08 3072 bits of static RAM 0.38 General logic0.05 Analog circuitry 1 TOTAL (approximate)

[8226] Note that there is no specific test circuitry (scan chains orBIST) within the QA Chip (see Section 10.3.10 on page 965), so the totaltransistor count is as shown in Table 371.

[8227] 10.1.3 Reset

[8228] The chip performs a RESET upon power-up. In addition, tamperdetection and prevention circuitry in the chip will cause the chip toeither RESET or erase Flash memory (depending on the attack detected) ifan attack is detected.

[8229] 10.2 Operating Speed

[8230] The base operating system clock SysClk is generated internallyfrom a ring oscillator (process dependant). Since the frequency varieswith operating temperature and voltage, the clock is passed through atemperature-based clock filter before use (see Section 10.3.3 on page961). The frequency is built into the chip during manufacture, andcannot be changed. The frequency is in the range 7-14 MHz.

[8231] 10.3 General Manufacturing Comments

[8232] Manufacturing comments are not normally made when normallydescribing the architecture of a chIP. However, in the case of the QAChip, the physical implementation of the chip is very much tied to thesecurity of the key. Consequently a number of specialized circuits andcomponents are necessary for implementation of the QA ChIP. They arelisted here.

[8233] Flash process

[8234] Internal randomized clock

[8235] Temperature based clock filter

[8236] Noise generator

[8237] Tamper Prevention and Detection circuitry

[8238] Protected memory with tamper detection

[8239] Boot-strap circuitry for loading program code

[8240] Data connections in polysilicon layers where possible

[8241] OverUnderPower Detection Unit

[8242] No scan-chains or BIST

[8243] 10.3.1 Flash Process

[8244] The QA Chip is implemented with a standard Flash manufacturingprocess. It is important that a Flash process be used to ensure thatgood endurance is achieved (parts of the Flash memory can beerased/written many times).

[8245] 10.3.2 Internal Randomized Clock

[8246] To prevent clock glitching and external clock-based attacks, theoperating clock of the chip should be generated internally. This can beconveniently accomplished by an internal ring oscillator. The length ofthe ring depends on the process used for manufacturing the chIP.

[8247] Due to process and temperature variations, the clock needs to betrimmed to bring it into a range usable for timing of Flash memorywrites and erases.

[8248] The internal clock should also contain a small amount ofrandomization to prevent attacks where light emissions from switchingevents are captured, as described below.

[8249] Finally, the generated clock must be passed through atemperature-based clock filter before being used by the rest of the chip(see Section 10.3.3 on page 961).

[8250] The normal situation for FET implementation for the case of aCMOS inverter (which involves a pMOS transistor combined with an nMOStransistor) as shown in FIG. 353.

[8251] During the transition, there is a small period of time where boththe nMOS transistor and the pMOS transistor have an intermediateresistance. The resultant power-ground short circuit causes a temporaryincrease in the current, and in fact accounts for around 20% of currentconsumed by a CMOS device. A small amount of infrared light is emittedduring the short circuit, and can be viewed through the siliconsubstrate (silicon is transparent to infrared light). A small amount oflight is also emitted during the charging and discharging of thetransistor gate capacitance and transmission line capacitance.

[8252] For circuitry that manipulates secret key information, suchinformation must be kept hidden.

[8253] Fortunately, IBM's PICA system and LVP (laser voltage probe) bothhave a requirement for repeatability due to the fact that the photoemissions are extremely weak (one photon requires more than 10⁵switching events). PICA requires around 10⁹ pases to build a picture ofthe optical waveform. Similarly the LVP requires multiple passes toensure an adequate SNR.

[8254] Randomizing the clock stops repeatability (from the point of viewof collecting information about the same position in time), andtherefore reduces the possibility of this attack.

[8255] 10.3.3 Temperature Based Clock Filter

[8256] The QA Chip circuitry is designed to operate within a specificclock speed range. Although the clock is generated by an internal ringoscillator, the speed varies with temperature and power. Since the usersupplies the temperature and power, it is possible for an attacker toattempt to introduce race-conditions in the circuitry at specific timesduring processing. An example of this is where a low temperature causesa clock speed higher than the circuitry is designed for, and this mayprevent an XOR from working properly, and of the two inputs, the firstmay always be returned. These styles of transient fault attacks aredocumented further in [1]. The lesson to be learned from this is thatthe input power and operating temperature cannot be trusted.

[8257] Since the chip contains a specific power filter, we must alsofilter the clock. This can be achieved with a temperature sensor thatallows the clock pulses through only when the temperature range is suchthat the chip can function correctly.

[8258] The filtered clock signal would be further divided internally asrequired.

[8259] 10.3.4 Noise Generator

[8260] Each QA Chip should contain a noise generator that generatescontinuous circuit noise. The noise will interfere with otherelectromagnetic emissions from the chip's regular activities and addnoise to the I_(dd) signal. Placement of the noise generator is not anissue on an QA Chip due to the length of the emission wavelengths.

[8261] The noise generator is used to generate electronic noise,multiple state changes each clock cycle, and as a source ofpseudo-random bits for the Tamper Prevention and Detection circuitry(see Section 10.3.5 on page 962).

[8262] A simple implementation of a noise generator is a 64-bit maximalperiod LFSR seeded with a non-zero number.

[8263] 10.3.5 Tamper Prevention and Detection Circuitry

[8264] A set of circuits is required to test for and prevent physicalattacks on the QA ChIP. However what is actually detected as an attackmay not be an intentional physical attack. It is therefore important todistinguish between these two types of attacks in an QA Chip:

[8265] where you can be certain that a physical attack has occurred.

[8266] where you cannot be certain that a physical attack has occurred.

[8267] The two types of detection differ in what is performed as aresult of the detection. In the first case, where the circuitry can becertain that a true physical attack has occurred, erasure of flashmemory key information is a sensible action. In the second case, wherethe circuitry cannot be sure if an attack has occurred, there is stillcertainly something wrong. Action must be taken, but the action shouldnot be the erasure of secret key information. A suitable action to takein the second case is a chip RESET. If what was detected was an attackthat has permanently damaged the chip, the same conditions will occurnext time and the chip will RESET again. If, on the other hand, what wasdetected was part of the normal operating environment of the chip, aRESET will not harm the key.

[8268] A good example of an event that circuitry cannot have knowledgeabout, is a power glitch. The glitch may be an intentional attack,attempting to reveal information about the key. It may, however, be theresult of a faulty connection, or simply the start of a power-downsequence. It is therefore best to only RESET the chip, and not erase thekey. If the chip was powering down, nothing is lost.

[8269] If the System is faulty, repeated RESETs will cause the consumerto get the System repaired. In both cases the consumable is stillintact.

[8270] A good example of an event that circuitry can have knowledgeabout, is the cutting of a data line within the chIP. If this attack issomehow detected, it could only be a result of a faulty chip(manufacturing defect) or an attack. In either case, the erasure of thesecret information is a sensible step to take.

[8271] Consequently each QA Chip should have 2 Tamper DetectionLines—one for definite attacks, and one for possible attacks. Connectedto these Tamper Detection Lines would be a number of Tamper Detectiontest units, each testing for different forms of tampering. In addition,we want to ensure that the Tamper Detection Lines and Circuitsthemselves cannot also be tampered with.

[8272] At one end of the Tamper Detection Line is a source ofpseudo-random bits (clocking at high speed compared to the generaloperating circuitry). The Noise Generator circuit described above is anadequate source. The generated bits pass through two different paths—onecarries the original data, and the other carries the inverse of thedata. The wires carrying these bits are in the layer above the generalchip circuitry (for example, the memory, the key manipulation circuitryetc.). The wires must also cover the random bit generator. The bits arerecombined at a number of places via an XOR gate. If the bits aredifferent (they should be), a 1 is output, and used by the particularunit (for example, each output bit from a memory read should be ANDedwith this bit value). The lines finally come together at the Flashmemory Erase circuit, where a complete erasure is triggered by a 0 fromthe XOR. Attached to the line is a number of triggers, each detecting aphysical attack on the chIP. Each trigger has an oversize nMOStransistor attached to GND. The Tamper Detection Line physically goesthrough this nMOS transistor. If the test fails, the trigger causes theTamper Detect Line to become 0. The XOR test will therefore fail oneither this clock cycle or the next one (on average), thus RESETing orerasing the chIP.

[8273]FIG. 349 illustrates the basic principle of a Tamper DetectionLine in terms of tests and the XOR connected to either the Erase orRESET circuitry.

[8274] The Tamper Detection Line must go through the drain of an outputtransistor for each test, as illustrated by FIG. 350.

[8275] It is not possible to break the Tamper Detect Line since thiswould stop the flow of 1 s and 0s from the random source. The XOR testswould therefore fail. As the Tamper Detect Line physically passesthrough each test, it is not possible to eliminate any particular testwithout breaking the Tamper Detect Line.

[8276] It is important that the XORs take values from a variety ofplaces along the Tamper Detect Lines in order to reduce the chances ofan attack. FIG. 351 illustrates the taking of multiple XORs from theTamper Detect Line to be used in the different parts of the chIP. Eachof these XORs can be considered to be generating a ChipOK bit that canbe used within each unit or sub-unit.

[8277] A typical usage would be to have an OK bit in each unit that isANDed with a given ChipOK bit each cycle. The OK bit is loaded with 1 ona RESET. If OK is 0, that unit will fail until the next RESET. If theTamper Detect Line is functioning correctly, the chip will either RESETor erase all key information. If the RESET or erase circuitry has beendestroyed, then this unit will not function, thus thwarting an attacker.

[8278] The destination of the RESET and Erase line and associatedcircuitry is very context sensitive. It needs to be protected in muchthe same way as the individual tamper tests. There is no pointgenerating a RESET pulse if the attacker can simply cut the wire leadingto the RESET circuitry.

[8279] The actual implementation will depend very much on what is to becleared at RESET, and how those items are cleared.

[8280] Finally, FIG. 352 shows how the Tamper Lines cover the noisegenerator circuitry of the chIP. The generator and NOT gate are on onelevel, while the Tamper Detect Lines run on a level above the generator.

[8281] 10.3.6 Protected Memory with Tamper Detection

[8282] It is not enough to simply store secret information or programcode in flash memory. The Flash memory and RAM must be protected from anattacker who would attempt to modify (or set) a particular bit ofprogram code or key information. The mechanism used must conform tobeing used in the Tamper Detection Circuitry (described above).

[8283] The first part of the solution is to ensure that the TamperDetection Line passes directly above each flash or RAM bit. This ensuresthat an attacker cannot probe the contents of flash or RAM. A breach ofthe covering wire is a break in the Tamper Detection Line. The breachcauses the Erase signal to be set, thus deleting any contents of thememory. The high frequency noise on the Tamper Detection Line alsoobscures passive observation.

[8284] The second part of the solution for flash is to always store thedata with its inverse. In each byte, 4 bits contains the data, and 4bits (the shadow) contains the inverse of the data. If both are 0, thisis a valid erase state, and the value is 0. Otherwise, the memory isonly valid if the 4 bits of shadow are the inverse of the main 4 bits.The reasoning is that it is possible to add electrons to flash via aFIB, but not take electrons away. If it is possible to change a 0 to 1for example, it is not possible to do the same to its inverse, andtherefore regardless of the sense of flash, an attack can be detected.

[8285] The second part of the solution for RAM is to use a parity bit.The data part of the register can be checked against the parity bit(which will not match after an attack).

[8286] The bits coming from Flash and RAM can therefore be validated bya number of test units (one per bit) connected to the common TamperDetection Line. The Tamper Detection circuitry would be the firstcircuitry the data passes through (thus stopping an attacker fromcutting the data lines).

[8287] In addition, the data and program code should be stored indifferent locations for each chip, so an attacker does not know where tolaunch an attack. Finally, XORing the data coming in and going to Flashwith a random number that varies for each chip means that the attackercannot learn anything about the key by setting or clearing an individualbit that has a probability of being the key (the inverse of the key mustalso be stored somewhere in flash).

[8288] Finally, each time the chip is called, every flash location isread before performing any program code. This allows the flash tamperdetection to be activated in a common spot instead of when the data isactually used or program code executed. This reduces the ability of anattacker to know exactly what was written to.

[8289] 10.3.7 Boot-Strap Circuitry for Loading Program Code

[8290] Program code should be kept in protected flash instead of ROM,since ROM is subject to being altered in a non-testable way. Aboot-strap mechanism is therefore required to load the program code intoflash memory (flash memory is in an indeterminate state aftermanufacture).

[8291] The boot-strap circuitry must not be in a ROM—a smallstate-machine suffices. Otherwise the boot code could be triviallymodified in an undetectable way.

[8292] The boot-strap circuitry must erase all flash memory, check toensure the erasure worked, and then load the program code.

[8293] The program code should only be executed once the flash programmemory has been validated via Program Mode.

[8294] Once the final program has been loaded, a fuse can be blown toprevent further programming of the chIP.

[8295] 10.3.8 Connections in Polysilicon Layers Where Possible

[8296] Wherever possible, the connections along which the key or secretdata flows, should be made in the polysilicon layers. Where necessary,they can be in metal 1, but must never be in the top metal layer(containing the Tamper Detection Lines).

[8297] 10.3.9 OverUnder Power Detection Unit

[8298] Each QA Chip requires an OverUnder Power Detection Unit (PDU) toprevent Power Supply Attacks. A PDU detects power glitches and tests thepower level against a Voltage Reference to ensure it is within a certaintolerance. The Unit contains a single Voltage Reference and twocomparators. The PDU would be connected into the RESET Tamper DetectionLine, thus causing a RESET when triggered.

[8299] A side effect of the PDU is that as the voltage drops during apower-down, a RESET is triggered, thus erasing any work registers.

[8300] 10.3.10 No Scan Chains or BIST

[8301] Test hardware on an QA Chip could very easily introducevulnerabilities. In addition, due to the small size of the QA Chiplogic, test hardware such as scan paths and BIST units could in facttake a sizeable chunk of the final chip, lowering yield and causing asituation where an error in the test hardware causes the chip to beunusable. As a result, the QA Chip should not contain any BIST or scanpaths. Instead, the program memory must first be validated via theProgram Mode mechanism, and then a series of program tests run to verifythe remaining parts of the chIP.

[8302] 11 Architecture

[8303]FIG. 389 shows a high level block diagram of the QA ChIP. Notethat the tamper prevention and detection circuitry is not shown.

[8304] 11.1 Analogue Unit

[8305]FIG. 390 shows a block diagram of the Analogue Unit. Blocks shownin yellow provide additional protection against physical and electricalattack and, depending on the level of security required, may optionallybe implemented.

[8306] 11.1.1 Ring Oscillator

[8307] The operating clock of the chip (SysClk) is generated by aninternal ring oscillator whose frequency can be trimmed to reduce thevariation from 4:1 (due to process and temperature) down to 2:1(temperature variations only) in order to satisfy the timingrequirements of the Flash memory.

[8308] The length of the ring depends on the process used formanufacturing the chIP. A nominal operating frequency range of 10 MHz issufficient. This clock should contain a small amount of randomization toprevent attacks where light emissions from switching events arecaptured.

[8309] Note that this is different to the input SClk which is the serialclock for external communication.

[8310] The ring oscillator is covered by both Tamper Detection andPrevention lines so that if an attacker attempts to tamper with theunit, the chip will either RESET or erase all secret information.

[8311] FPGA Note: the FPGA does not have an internal ring oscillator. Anadditional pin (SysClk) is used instead. This is replaced by an internalring oscillator in the final ASIC.

[8312] 11.1.2 Voltage Reference

[8313] The voltage reference block maintains an output which issubstantially independent of process, supply voltage and temperature. Itprovides a reference voltage which is used by the PDU and a referencecurrent to stabilise the ring oscillator. It may also be used as part ofthe temperature based clock filter described in Section 10.3.3 on page961.

[8314] 11.1.3 OverUnder Power Detection Unit

[8315] The OverUnder Power Detection Unit (PDU) is the same as thatdescribed in Section 10.3.9 on page 965.

[8316] The Under Voltage Detection Unit provides the signal PwrFailingwhich, if asserted, indicates that the power supply may be turning off.This signal is used to rapidly terminate any Flash write that may be inprogress to avoid accidentally writing to an indeterminate memorylocation.

[8317] Note that the PDU triggers the RESET Tamper Detection Line only.It does not trigger the Erase Tamper Detection Line.

[8318] The PDU can be implemented with regular CMOS, since the key doesnot pass through this unit. It does not have to be implemented withnon-flashing CMOS.

[8319] The PDU is covered by both Tamper Detection and Prevention linesso that if an attacker attempts to tamper with the unit, the chip willeither RESET or erase all secret information.

[8320] 11.1.4 Power-On Reset and Tamper Detect Unit

[8321] The Power-on Reset unit (POR) detects a power-on condition andgenerates the PORstL signal that is fed to all the validation units,including the two inside the Tamper Detect Unit (TDU).

[8322] All other logic is connected to RstL, which is the PORstL gatedby the VAL unit attached to the Reset tamper detection lines (seeSection 10.3.5 on page 962) within the TDU. Therefore, if the Resettamper line is asserted, the validation will drive RstL low, and canonly be cleared by a power-down. If the tamper line is not asserted,then RstL=PORstL.

[8323] The TDU contains a second VAL unit attached to the Erase tamperdetection lines (see Section 10.3.5 on page 962) within the TDU. Itproduces a TamperEraseOK signal that is output to the MIU (1=the tamperlines are all OK, 0=force an erasure of Flash).

[8324] 11.1.5 Noise Generator

[8325] The Noise Generator (NG) is the same as that described in Section10.3.4 on page 961. It is based on a 64-bit maximal period LFSR loadedwith a set non-zero bit pattern on RESET.

[8326] The NG must be protected by both Tamper Detection and Preventionlines so that if an attacker attempts to tamper with the unit, the chipwill either RESET or erase all secret information.

[8327] In addition, the bits in the LFSR must be validated to ensurethey have not been tampered with (i.e. a parity check). If the paritycheck fails, the Erase Tamper Detection Line is triggered.

[8328] Finally, all 64 bits of the NG are ORed into a single bit. Ifthis bit is 0, the Erase Tamper Detection Line is triggered. This isbecause 0 is an invalid state for an LFSR.

[8329] 11.2 Trim Unit

[8330] The 8-bit Trim register within the Trim Unit has a reset value of0x00(to enable the flash reads to succeed even in the fastest processcorners), and is written to either by the PMU during Trim Mode or by theCPU in Active Mode. Note that the CPU is only able to write once to theTrim register between power-on-reset due to the TrimDone flag whichprovides overloading of LocalidWE.

[8331] The reset value of Trim (0) means that the chip has a nominalfrequency of 2.7 MHz-10 MHz. The upper of the range is when we cannottrim it lower than this (or we could allow some spread on the acceptabletrimmed frequency but this will reduce our tolerance to ageing, voltageand temperature which is the range 7 MHz to 14 MHz). The 2.7 MHz valueis determined by a chip whose oscillator runs at 10 MHz when the trimregister is set to its maximum value, so then it must run at 2.7 MHzwhen trim=0. This is based on the non-linear frequency-currentcharacteristic of the oscillator.

[8332] Chips found outside of these limits will be rejected.

[8333] The frequency of the ring oscillator is measured by countingcycles⁶, in the PMU, over the byte period of the serial interface. Thefrequency of the serial clock, SClk, and therefore the byte period willbe accurately controlled during the measurement. The cycle count(Fineas) at the end of the period is read over the serial bus and theTrim register updated (Trimval) from its power on default (POD) value.The steps are shown in FIG. 391. Multiple measure—read—trim cycles arepossible to improve the accuracy of the trim procedure.

[8334] A single byte for both Fineas and Trimval provide sufficientaccuracy for measurement and trimming of the frequency. If the busoperates at 400 kHz, a byte (8 bits) can be sent in 20 μs. By dividingthe maximum oscillator frequency, expected to be 20 MHz, by 2 results ina cycle count of 200 and 50 for the minimum frequency of 5 MHz resultingin a worst case accuracy of 2%.

[8335]FIG. 392 shows a block diagram of the Trim Unit:

[8336] The 8-bit Trim value is used in the analog Trim Block to adjustthe frequency of the ring oscillator by controlling its bias current.The two lsbs are used as a voltage trim, and the 6 msbs are used as afrequency trim.

[8337] The analog Trim Clock circuit also contains a Temperature filteras described in Section 10.3.3 on page 961.

[8338] 11.310 Unit

[8339] The QA Chip acts as a slave device, accepting serial data from anexternal master via the IO Unit (IOU). Although the IOU actuallytransmits data over a 1-bit line, the data is always transmitted andreceived in 1-byte chunks.

[8340] The IOU receives commands from the master to place it in aspecific operating mode, which is one of:

[8341] Idle Mode: is the startup mode for the IOU if the fuse has notyet been blown. Idle Mode is the mode where the QA Chip is waiting forthe next command from the master. Input signals from the CPU areignored.

[8342] Program Mode: is where the QA Chip erases all currently storeddata in the Flash memory (program and secret key information) and thenallows new data to be written to the Flash. The IOU stays in ProgramMode until told to enter another mode.

[8343] Active Mode: is the startup mode for the IOU if the fuse has beenblown (the program is safe to run). Active Mode is where the QA Chipallows the program code to be executed to process the master's specificcommand. The IOU returns to Idle Mode automatically when the command hasbeen processed, or if the time taken between consuming input bytes(while the master is writing the data) or generating output bytes (whilethe master is reading the results) is too great.

[8344] Trim Mode: is where the QA Chip allows the generation and settingof a trim value to be used on the internal ring oscillator clock value.This must be done for safety reasons before a program can be stored inthe Flash memory.

[8345] See Section 12 on page 970 for detailed information about theIOU.

[8346] 11.4 Central Processing Unit

[8347] The Central Processing Unit (CPU) block provides the majority ofthe circuitry of the 4-bit microprocessor. FIG. 393 shows a high levelview of the block.

[8348] 11.5 Memory Interface Unit

[8349] The Memory Interface Unit (MIU) provides the interface to flashand RAM. The MIU contains a Program Mode Unit that allows flash memoryto be loaded via the IOU, a Memory Request Unit that maps 8-bit and32-bit requests into multiple byte based requests, and a Memory AccessUnit that generates read/write strobes for individual accesses to thememory.

[8350]FIG. 394 shows a high level view of the MIU block.

[8351] 11.6 Memory Components

[8352] The Memory Components block isolates the memory implementationfrom the rest of the QA ChIP. The entire contents of the MemoryComponents block must be protected from tampering. Therefore the logicmust be covered by both Tamper Detection Lines. This is to ensure thatprogram code, keys, and intermediate data values cannot be changed by anattacker. The 8-bit wide RAM also needs to be parity-checked.

[8353]FIG. 395 shows a high level view of the Memory Components block.It consists of 8 KBytes of flash memory and 3072 bits of parity checkedRAM.

[8354] 11.6.1 RAM

[8355] The RAM block is shown here as a simple 96×32-bit RAM (plusparity included for verification).

[8356] The parity bit is generated during the write.

[8357] The RAM is in an unknown state after RESET, so program codecannot rely on RAM being 0 at startup.

[8358] The initial version of the ASIC has the RAM implemented byArtisan component RA1SH (96×32-bit RAM without parity). Note that theRAMOutEn port is active low i.e. when 0, the RAM is enabled, and when 1,the RAM is disabled.

[8359] 11.6.2 Flash Memory

[8360] A single Flash memory block is used to hold all non-volatiledata. This includes program code and variables. The Flash memory blockis implemented by TSMC component SFC0008_(—)08B9_HE [4], which has thefollowing characteristics:

[8361] 8 K×8-bit main memory, plus 128×8-bit information memory

[8362] 512 byte page erase

[8363] Endurance of 20,000 cycles (min)

[8364] Greater than 100 years data retention at room temperature

[8365] Access time: 20 ns (max)

[8366] Byte write time: 20 μs (min)

[8367] Page erase time: 20 ms (min)

[8368] Device erase time: 200 ms (min)

[8369] Area of 0.494 mm² (724.66 μm×682.05 μm)

[8370] The FlashCtrl line are the various inputs on theSFC0008_(—)08B9_HE required to read and write bytes, erase pages anderase the device. A total of 9 bits are required (see [4] for moreinformation).

[8371] Flash values are unchanged by a RESET. After manufacture, theFlash contents must be considered to be garbage. After an erasure, theFlash contents in the SFC0008_(—)08B9_HE is all 1 s.

[8372] 11.6.3 VAL Blocks

[8373] The two VAL units are validation units connected to the TamperPrevention and Detection circuitry (described in Section 10.3.5 on page962), each with an OK bit. The OK bit is set to 1 on PORstL, and ORedwith the ChipOK values from both Tamper Detection Lines each cycle. TheOK bit is ANDed with each data bit that passes through the unit.

[8374] In the case of VAL₁, the effective byte output from the flashwill always be 0 if the chip has been tampered with. This will causeshadow tests to fail, program code will not execute, and the chip willhang.

[8375] In the case of VAL₂, the effective byte from RAM will always be 0if the chip has been tampered with, thus resulting in no temporarystorage for use by an attacker.

[8376] 12 IO Unit

[8377] The I/O Unit (IOU) is responsible for providing the physicalimplementation of the logical interface described in Section 5.1 on page933, moving between the various modes (Idle, Program, Trim and Active)according to commands sent by the master.

[8378] The IOU therefore contains the circuitry for communicatingexternally with the external world via the SClk and SDa pins. The IOUsends and receives data in 8-bit chunks. Data is sent serially, mostsignificant bit (bit 7) first through to least significant bit (bit 0)last. When a master sends a command to an QA Chip, the command commenceswith a single byte containing an id in bits 7-1, and a read/write sensein bit 0, as shown in FIG. 396.

[8379] The IOU recognizes a global id of 0x00 and a local id of LocalId(set after the CPU has executed program code at reset or due to a globalid/ActiveMode command on the serial bus). Subsequent bytes contain modalinformation in the case of global ID, and command/data bytes in the caseof a match with the local id.

[8380] If the master sends data too fast, then the IOU will miss data,since the IOU never holds the bus.

[8381] The meaning of too fast depends on what is running. In ProgramMode, the master must send data a little slower than the time it takesto write the byte to flash (actually written as 2×8-bit writes, or 40μs). In ActiveMode, the master is permitted to send and request data atrates up to 500 KHz.

[8382] None of the latches in the IOU need to be parity checked sincethere is no advantage for an attacker to destroy or modify them.

[8383] The IOU outputs 0s and inputs 0s if either of the TamperDetection Lines is broken. This will only come into effect if anattacker has disabled the RESET and/or erase circuitry, since breakingeither Tamper Detection Lines should result in a RESET or the erasure ofall Flash memory.

[8384] The IOU's InByte, InByteValID, OutByte, and OutByteValidregisters are used for communication between the master and the QA ChIP.InByte and InByteValid provide the means for clients to pass commandsand data to the QA ChIP. OutByte and OutByteValid provide the means forthe master to read data from the QA ChIP.

[8385] Reads from InByte should wait until InByteValid is set.InByteValid will remain clear until the master has written the nextinput byte to the QA ChIP. When the IOU is told (by the FEU or MU) thatInByte has been read, the IOU clears the InByteValid bit to allow thenext byte to be read from the client.

[8386] Writes to OutByte should wait until OutByteValid is clear.Writing OutByte sets the OutByteValid bit to signify that data isavailable to be transmitted to the master. OutByteValid will then remainset until the master has read the data from OutByte. If the masterrequests a byte but OutByteValid is clear, the IOU sends a NAck toindicate the data is not yet ready.

[8387] When the chip is reset via RstL, the IOU enters ActiveMode toallow the PMU to run to load the fuse. Once the fuse has been loaded(when MlUAvail transitions from 0 to 1) the IOU checks to see if theprogram is known to be safe. If it is not safe, the IOU reverts toIdleMode. If it is safe (FuseBlown=1), the IOU stays in ActiveMode toallow the program to load up the localId and do any other resetinitialization, and will not process any further serial commands untilthe CPU has written a byte to the OutByte register (which may be read ornot at the discretion of the master using a localId read). In both casesthe master is then able to send commands to the QA Chip as described inSection 5.1 on page 933.

[8388]FIG. 397 shows a block diagram of the IOU.

[8389] With regards to InByteValid inputs, set has priority over reset,although both set and reset in correct operation should never beasserted at the same time. With regards to IOSetInByte and IOLoadInByte,if IOSetInByte is asserted, it will set InByte to be 0xFF regardless ofthe setting of IOLoadInByte.

[8390] The two VAL units are validation units connected to the TamperPrevention and Detection circuitry (described in Section 10.3.5 of theArchitecture Overview chapter), each with an OK bit. The OK bit is setto 1 on PORstL, and ORed with the ChipOK values from both TamperDetection Lines each cycle. The OK bit is ANDed with each data bit thatpasses through the unit.

[8391] In the case of VAL₁, the effective byte output from the chip willalways be 0 if the chip has been tampered with. Thus no useful outputcan be generated by an attacker. In the case of VAL2, the effective byteinput to the chip will always be 0 if the chip has been tampered with.Thus no useful input can be chosen by an attacker.

[8392] There is no need to verify the registers in the IOU since anattacker does not gain anything by destroying or modifying them.

[8393] The current mode of the IOU is output as a 2-bit IOMode to allowthe other units within the QA Chip to take correct action. IOMode isdefined as shown in Table 372: TABLE 372 IOMode values ValueInterpretation 00 Idle Mode 01 Program Mode 10 Active Mode 11 Trim Mode

[8394] The Logic blocks generate a 1 if the current IOMode is in ProgramMode, Active Mode or Trim Mode respectively. The logic blocks are:Logic₁ IOMode = 01 (Program) Logic₂ IOMode = 10 (Active) Logic₃ IOMode =11 (Trim)

[8395] 12.1 State Machine

[8396] There are two state machines in the IOU running in parallel. Thefirst is a byte-oriented state machine, the second is a bit-orientedstate machine. The byte-oriented state machine keeps track of theoperating mode of the QA Chip while the bit-oriented state machine keepstrack of the low-level bit Rx/Tx protocol.

[8397] The SDa and SClk lines are connected to the respective pads onthe QA ChIP. The IOU passes each of the signals from the pads through 2D-types to compensate for metastability on input, and then a furtherlatch and comparitor to ensure that signals are only used if stable for2 consecutive internal clock cycles. The circuit is shown in Section12.1.1 below.

[8398] 12.1.1 Start/Stop Control Signals

[8399] The StartDetected and StopDetected control signals are generatedbased upon monitoring SDa synchronized to SClk. The StartDetectedcondition is asserted on the falling edge of SDa synchronized to SClk,and the StopDetected condition is asserted on the rising edge of SDasynchronized to SClk.

[8400] In addition we generate feSClk which is asserted on the fallingedge of SClk, and reSClk which is asserted on the rising edge of SClk.Finally, feSclkPrev is the value of feSClk delayed by a single cycle.FIG. 398 shows the relationship of inputs and the generation of SDaReg,reSClk, feSClk, feSclkPrev, StartDetected and StopDetected.

[8401] The SDaRegSelect logic compensates for the 2:1 variation in clockfrequency. It uses the length of the high period of the SClk (from thesaturating counter) to select between sda5, sda6 and sda7 as the validdata from 300 ns before the falling edge of SClk as follows.

[8402] The minimum time for the high period of SClk is 600 ns. If thecounter <=4 (i.e. 5 or fewer cycles with SClk=1) then SDaReg output=sda5(sample point is equidistant from rising and falling edges). If thecounter=5 or 6 (i.e. 6 or 7 samples where SClk=1), then SDaRegoutput=sda6. If the counter=7 (the counter saturates when there are 8samples of SClk=1), then SDaReg output=sda7. This is shown in pseducodebelow: If ((counter₂ = 0)

(counter = 4)) SDaReg = sda5 ElseIf (counter = 7) SDaReg = sda7 ElseSDaReg = sda6 EndIf

[8403] The counter also provides a means of enabling start and stopdetection. There is a minimum of a 600 ns setup and 600 ns hold time forstart and stop conditions. At 14 MHz this means samples 4 and 5 afterthe rising edge (sample 1 is considered to be the first sample whereSClk=1) could potentially include a valid start or stop condition. At 7MHz samples 4 and 5 represent 284 and 355 ns respectively, although thisis after the rising edge of SClk, which itself is 100 ns after the setupof data (i.e. 384 and 455 ns respectively and therefore safe forsampling). Thus the data will be stable (although not a start or stop).Since we detect stops and starts using sda5 and sda6, we can onlyvalidly detect starts and stops 6 cycles after a rising edge, and weneed to not-detect starts and stops 4 cycles before the falling edge. Wetherefore only detect starts and stops when the counter is >=6 (i.e.when sclk3 and scik2 are 0 and 1 respectively, sda2 holds sample 1coincident with the rising edge, sdal holds sample 2, sda0 holds sample3, we load the counter with 0 and sample SDa to obtain the new sda0which will hold sample 4 at the end of the cycle. Thus while the counteris incrementing from 0 to 1, sda0 will hold sample 4. Therefore sample 4will be in sda6 when the counter is 6.

[8404] 12.1.2 Control of SDa and SClk Pins

[8405] The SClk line is always driven by the master. The SDa line isdriven low whenever we want to transmit an ACK (SDa is active low) or a0-bit from OutByte. The generation of the SDa pin is shown in thefollowing pseudocode: TxAck = (bitSM_state = ack)

((byteSM_state = doWrite)

(((byteSM_state = getGlobalCmd)

(byteSM_state = checkId))

AckCmd)) TxBit

(byteSM_state = doRead)

(bitSM_state = xferBit)

OutByte

bitCount SDa =

(TxAck

TxBit) # only drive the line when we are xmitting a 0

[8406] The slew rate of the SDa line should be restricted to minimiseground bounce. The pad must guarantee a fall time >20 ns. The rise timewill be controlled by the external pull up resistor and bus capacitance.

[8407] 12.1.3 Bit-Oriented State Machine

[8408] The bit-oriented state machine keeps track of the general flow ofserial transmission including start/data/ack/stop as shown in thefollowing pseudocode: idle EndByte = FALSE EndAck = FALSE If(StartDetected) state

starting Else state

idle EndIf starting EndByte = FALSE EndAck = FALSE NAck

0 If (StopDetected) state

idle ElseIf (feSClkPrev) bitCount

0 state

xferBit Else state

starting# includes StartDetected EndIf xferBit EndAck = FALSE EndByte =(feSclkPrev

(bitCount = 0)) # after feSclk bitCount must be 1..8 If (feSClk)shiftLeft [ioByte, SDaReg] # capture the bit in the ioByte shiftregister bitCount

bitCount + 1 # modulo count due to 3 bit bitCount EndIf If(StopDetected) state

idle ElseIf (StartDetected) state

starting ElseIf (EndByte) state

ack Else state

xferBit EndIf ack EndByte = FALSE EndAck = feSclkPrev If (StopDetected)state

idle ElseIf (StartDetected) state

starting ElseIf (EndAck) state

xferBit # bitCount is already 0 Else If (feSClk) NAck

SDaReg # active low, so 0 = ACK, 1 = NACK EndIf state

ack EndIf

[8409] 12.1.4 Byte-Oriented State Machine

[8410] The following pseudocode illustrates the general startup state ofthe IOU and the receipt of a transmission from the master. rstL # setupstate of registers on reset IOMode

ActiveMode # to force the fuse to be loaded OutByteValid

0 OutByte

0 InByteValid

1 # required InByte

0xFF # byte = FF = the ‘reset’ command localId

0 # loads localId with theglobalId so no localId exists state

wait4fuse wait4fuse If (MIUAvail) If (FuseBlown) # this must be donesame cycle as seeing MIUAvail go high state

wait4cpu Else IOMode

IdleMode # CPU will now require an external ActiveMode to start state

idle Else state

wait4fuse EndIf wait4cpu If (CPUOutByteWE) # wait for CPU resetactivities to finish state

idle # note: we're still in ActiveMode Else state

wait4cpu EndIf idle If (StartDetected) state

checkId Else state

idle EndIf

[8411] The first byte received must be checked to ensure it is meant foreveryone (globaild of 0) or specifically for us (localid matches). Weonly send an ACK to a read when there is data available to send. Inaddition, writes to the general call address (0) are always ACKed, butreads from the general call address are only ACKed before the fuse hasbeen blown. checkId isWrite = (ioByte₀ = 0) isRead = (ioByte₀ = 1)isGlobal = (ioByte⁷⁻¹ = 0) globalW = isGlobal

isWrite localW = (ioByte⁷⁻¹ = localID)

isWrite

isGlobal localR = (ioByte⁷⁻¹ = localID)

isRead

(

GlobalW

FuseBlown) If (StopDetected) state

idle ElseIf (EndByte) AckCmd_in = (globalW

localW)

(localR

OutByteValid) AckCmd

AckCmd_in If (localW) IOMode

IdleMode # jic - any output was pending IOOutByteUsed = 1 IOClearInByte= 1 # ensure there is nothing hanging around from before EndIf ElseIf(EndAck) If (globalW) # globalW and localW are mutually exclusive state

getGlobalCmd ElseIf (localW) IOMode

ActiveMode IOLoadInByte = 1 # will set inByte to localW (lsb will be 0)state

doWrite ElseIf (localR

IOMode₁

AckCmd) # Active mode (or Trim when fuse intact) state

doRead Else state

idle # ignore reads unless first in active or trim mode EndIf Else state

checkId EndIf

[8412] With a new global command the IOU waits for the mode byte (seeTable page6 on page 934) to determine the new operating mode:getGlobalCmd wantProg = ((ioByte = ProgramModeId)

FuseBlown) wantTrim = ((ioByte = TrimModeId)

FuseBlown) wantActive = (ioByte = ActiveModeId) If (StopDetected) state

idle ElseIf (StartDetected) state

checkId ElseIf (EndByte) AckCmd_in = wantActive

wantProg

wantTrim # only ACK cmds we can do AckCmd

AckCmd_in If (AckCmd_in) IOMode

IdleMode # jic - any output was pending IOOutByteUsed = 1 IOClearInByte= 1 # ensure there is nothing hanging around from before EndIf ElseIf(EndAck) If (wantProg) IOMode

ProgramMode # don't load inByte (we only want the data) state

doWrite ElseIf (wantTrim) IOMode

TrimMode # don't load InByte (we only want the next byte) state

doWrite ElseIf (wantActive) # must be Active IOMode

ActiveMode IOSetInByte = 1 # 0 for all other cases & states. 1 = setsinByte to 0xFF IOLoadInByte = 1 # sets InByteValid (InByte is set to0xFF (‘reset’ cmd)) state

wait4cpu# don't do anything til the cpu has completed this task Elsestate

idle # unknown id, so ignore remainder EndIf Else state

getGlobalCmd EndIf

[8413] When the master writes bytes to the QA Chip (e.g. parameters fora command), the program must consume the byte fast enough (i.e. duringthe sending of the ACK) or subsequent bits may be lost.

[8414] The process of receiving bytes is shown in the followingpseudocode: doWrite If (StopDetected) state

idle # stay in whatever IOMode we were in ElseIf (StartDetected) state

checkId Else If (EndByte) IOLoadInByte =

InByteValid EndIf If (EndByte

InByteValid) # will only be when master sends data too quickly state

idle # ACK will not be sent when in idle state Else state

doWrite # ACK will be sent automatically after byte is Rxed EndIf EndIf

[8415] When the master wants to read, the IOU sends one byte at a timeas requested. The process is shown in the following pseudocode: doReadIf (StopDetected) state

idle ElseIf (StartDetected) state

checkId ElseIf (EndAck) If (NAck

OutByteValid) state

idle Else state

doRead EndIf Else If (EndByte) IOOutByteUsed = 1 EndIf state

doRead EndIf

[8416] 13 Fetch and Execute Unit

[8417] 13.1 Introduction

[8418] The QA Chip does not require the high speeds and throughput of ageneral purpose CPU. It must operate fast enough to perform theauthentication protocols, but not faster. Rather than have specializedcircuitry for optimizing branch control or executing opcodes whilefetching the next one (and all the complexity associated with that), thestate machine adopts a simplistic view of the world. This helps tominimize design time as well as reducing the possibility of error inimplementation.

[8419] The FEU is responsible for generating the operating cycles of theCPU, stalling appropriately during long command operations due to memorylatency.

[8420] When a new transaction begins, the FEU will generate a JPZ (jumpto zero) instruction.

[8421] The general operation of the FEU is to generate sets of cycles:

[8422] Cycle 0: fetch cycles. This is where the opcode is fetched fromthe program memory, and the effective address from the fetched opcode isgenerated. The Fetch output flag is set during the final cycle 0 (i.e.when the opcode is finally valID).

[8423] Cycle 1: execute cycle. This is where the operand is(potentially) looked up via the generated effective address (from Cycle0) and the operation itself is executed. The Exec output flag is setduring the final cycle 1 (i.e. when the operand is finally valID).

[8424] Under normal conditions, the state machine generates multipleCycle=0 followed by multiple Cycle=1. This is because the program isstored in flash memory, and may take multiple cycles to read. Inaddition, writes to and erasures of flash memory take differing numbersof cycles to perform. The FEU will stall, generating multiple instancesof the same Cycle value with Fetch and Exec both 0 until the inputMIURdy=1, whereupon a Fetch or Exec pulse will be generated in that samecycle.

[8425] There are also two cases for stalling due to serial I/Ooperations:

[8426] The opcode is ROR OutByte, and OutByteValid=1. This means thatthe current operation requires outputting a byte to the master, but themaster hasn't read the last byte yet.

[8427] The operation is ROR InByte, and InByteValid=0. This means thatthe current operation requires reading a byte from the master, but themaster hasn't supplied the byte yet.

[8428] In both these cases, the FEU must stall until the stallingcondition has finished.

[8429] Finally, the FEU must stop executing code if the IOU exits ActiveMode.

[8430] The local Cmd opcode/operand latch needs to be parity-checked.The logic and registers contained in the FEU must be covered by bothTamper Detection Lines. This is to ensure that the instructions to beexecuted are not changed by an attacker.

[8431] 13.2 State Machine

[8432] The Fetch and Execute Unit (FEU) is combinatorial logic with thefollowing registers: TABLE 373 FEU Registers Name #bits DescriptionOutput registers (visible outside the FEU) Cycle 1 0 if the FEU iscurrently fetching an opcode, 1 if the FEU is currently executing theopcode. NewMemTrans 1 Is asserted during the start of a potential newmemory access. 0 = this is not the first cycle of a set of Cycle 0 orCycle 1 1 = this is the first cycle of a set of Cycle 0 or Cycle 1(previous cycle must have been a Fetch or an Exec). Go 1 1 if the FEU iscurrently fetching and executing program code (i.e. a program iscurrently running), 0 if it is not. Local registers (not visible outsidethe FEU) CurrCmd 8 + p Holds the currently executing instruction (paritychecked). PendingKill 1 The currently executing program is waiting to behalted (waiting due to memory access) PendingStart 1 A new transactionis waiting to be started (waiting due to memory access or an existingtransaction not yet stopped) WasIdle 1 The previous cycle had an IOModeof IdleMode.

[8433] In addition, the following externally visible outputs aregenerated asynchronously: TABLE 374 Externally visible asynchronous FEUoutputs Name #bits Description Fetch 1 1 if the FEU is performing thefinal cycle of a fetch (i.e. Cycle will also be 0). It is set when theNextCmd output is valid. The local Cmd register is latched during theFetch cycle with either the incoming MIU8Data or an FEU-generatedcommand. Exec 1 1 if the FEU is performing the final cycle of an execute(i.e. Cycle will also be 1). It is set when the data required by theopcode from the MIU is valid. Other units can execute the Cmd and latchdata from the MIU (e.g. from MIUData) during the Exec cycle. Cmd 8 WhenCycle = 0, this holds the next instruction to be executed (during thenext Cycle = 1). Is generated based on incoming MIU8Data or substitutedFEU command (e.g. JSR 0). When Cycle = 1, this holds the currentinstruction being executed (based on theCmd).

[8434] The Cycle and currCmd registers are not used directly. Instead,their outputs are passed through a VAL unit before use. The VAL unitsare designed to validate the data that passes through them. Eachcontains an OK bit connected to both Tamper Prevention and DetectionLines. The OK bit is set to 1 on PORstL, and ORed with the ChipOK valuesfrom both Tamper Detection Lines each cycle. The OK bit is ANDed witheach data bit that passes through the unit.

[8435] In the case of VAL₁, the effective Cycle will always be 0 if thechip has been tampered with. Thus no program code will execute.

[8436] In the case of VAL₂, the effective 8-bit currCmd value willalways be 0 if the chip has been tampered with. Multiple 0s will beinterpreted as the JSR 0 instruction, and this will effectively hang theCPU. VAL₂ also performs a parity check on the bits from currCmd toensure that currCmd has not been tampered with. If the parity checkfails, the Erase Tamper Detection Line is triggered. For moreinformation on Tamper Prevention and Detection circuitry, see Section10.3.5 on page 962.

[8437] 13.2.1 Pseudocode reset conditions: Fetch = 0 Exec = 0 Cycle

0 currCmd

0 Go

0 pendingKill

0 pendingStart

0 newMemTrans

0 wasIdle

1 # required to detect if IOU starts in a non-idle state

[8438] The cycle by cycle combinatorial logic behaviour is shown in thefollowing pseudocode: isActive = (IOMode = ActiveMode) wasIdle

(IOMode = IdleMode) wantToStart = (pendingStart

wasIdle)

isActive newTrans = wantToStart

Go

MIUAvail pendingStart

wantToStart

newTrans killTrans = Go

(

isActive

pendingKill) Fetch = newTrans

(Go

Cycle

MIURdy

killTrans) inDelay = (currCmd = ROR InByte)

InByteValid outDelay = (currCmd = ROR OutByte)

OutByteValid ioDelay = inDelay

outDelay Exec = Go

Cycle

MIURdy

ioDelay If (Cycle) Cmd = currCmd ElseIf (newTrans) Cmd = JPZ # jump to 0Else Cmd = MIU8Data EndIf resetGo = (MIURdy

killTrans)

(Fetch

(Cmd = HALT)) pendingKill

killTrans

resetGo changeCycle = Fetch

Exec # will only be 1 when Go = 1 Cycle

newTrans

((Cycle ⊕ changeCycle)

resetGo) newMemTrans

newTrans

(changeCycle

resetGo) If (Fetch) currCmd

Cmd EndIf If (resetGo) Go

0 ElseIf (newTrans) Go

1 EndIf

[8439] 14 ALU

[8440] The Arithmetic Logic Unit (ALU) contains a 32-bit Acc(Accumulator) register as well as the circuitry for simple arithmeticand logical operations.

[8441] The logic and registers contained in the ALU must be covered byboth Tamper Detection Lines. This is to ensure that keys andintermediate calculation values cannot be changed by an attacker. Inaddition, the Accumulator must be parity-checked.

[8442] A 1-bit Z signal represents the state of zero-ness of theAccumulator. The Accumulator is cleared to 0 upon a RstL, and the Zsignal is set to 1. The Accumulator is updated for any of the commands:AND, OR, XOR, ADD, ROR, and RIA, and the Z signal is updated wheneverthe Accumulator is updated. Note that the Z signal is actuallyimplemented as a nonZ register whose output is passed through aninverter and used as Z.

[8443] Each arithmetic and logical block operates on two 32-bit inputs:the current value of the Accumulator, and the current 32-bit output ofthe DataSel block (either the 32 bit value from MIUData or an immediatevalue). The AND, OR, XOR and ADD blocks perform the standard 32-bitoperations. The remaining blocks are outlined below.

[8444]FIG. 399 shows a block diagram of the ALU:

[8445] The Accumulator is updated for all instructions where the highbit of the opcode is set: Logic₁ Exec

Cmd₇

[8446] Since the WriteEnables of Acc and nonZ takes Cmd7 and Exec intoaccount (due to Logic₁), these two bits are not required by themultiplexor MX₁ in order to select the output. The output selection forMX₁ only requires bits 6-3 of the Cmd and is therefore simpler as aresult (as shown in Table 375). TABLE 375 Selection for multiplexor MX₁Output Cmd⁶⁻³ MX₁ immOut 011x

1110 (LD) rorOut 100x

1111 (RIA, ROR) from XOR 001x

1100 (XOR) from ADD 010x

1101 (ADD) from AND 0000

1010 (AND) from OR 0001

1011 (OR)

[8447] The two VAL units are validation units connected to the TamperPrevention and Detection circuitry (described in Section 10.3.5 on page962), each with an OK bit. The OK bit is set to 1 on PORstL, and ORedwith the ChipOK values from both Tamper Detection Lines each cycle. TheOK bit is ANDed with each data bit that passes through the unit.

[8448] In the case of VAL₁, the effective bit output from theAccumulator will always be 0 if the chip has been tampered with. Thisprevents an attacker from processing anything involving the Accumulator.VAL₁ also performs a parity check on the Accumulator, setting the EraseTamper Detection Line if the check fails.

[8449] In the case of VAL₂, the effective Z status of the Accumulatorwill always be true if the chip has been tampered with. Thus no loopingconstructs can be created by an attacker.

[8450] 14.1 DataSel Block

[8451] The DataSel block is designed to implement the selection betweenthe MIU32Data and the immediate addressing mode for logical commands.

[8452] Immediate addressing relies on 3 bits of operand, plus anoptional 8 bits at PC+1 to determine an 8-bit base value. Bits 0 to 1determine whether the base value comes from the opcode byte itself, orfrom PC+1, as shown in Table 376. TABLE 376 Selection for base value inimmediate mode Cmd¹⁻⁰ Base value 00 00000000 01 00000001 10 From PC + 1(i.e. MIUData³¹⁻²⁴) 11 11111111

[8453] The base value is computed by using CMD₀ as bit 0, and copyingCMD₁ into the upper 7 bits.

[8454] The 8-bit base value forms the lower 8 bits of output. These 8bits are also ANDed with the sense of whether the data is replicated inthe upper bits or not (i.e. CMD₂). The resultant bits are copied in 3times to form the upper 24 bits of the output.

[8455]FIG. 400 shows a block diagram of the ALU's DataSel block:

[8456] 14.2 ROR Block

[8457] The ROR block implements the ROR and RIA functionality of theALU.

[8458] A 1-bit register named RTMP is contained within the ROR unit.RTMP is cleared to 0 on a RstL, and set during the ROR RB and ROR XRBcommands. The RTMP register allows implementation of Linear FeedbackShift Registers with any tap configuration.

[8459]FIG. 401 shows a block diagram of the ALU's ROR block:

[8460] The ROR n, blocks are shown for clarity, but in fact would behardwired into multiplexor MX₃, since each block is simply a rewiring ofthe 32-bits, rotated right n bits.

[8461] Logic₁ is used to provide the WriteEnable signal to RTMP. TheRTMP register should only be written to during ROR RB and ROR XRBcommands. The combinatorial logic block is: Logic₁ Exec

(Cmd⁷⁻⁴ = ROR)

(Cmd³⁻¹ = 000)

[8462] Multiplexor MX₁ performs the task of selecting the 6-bit valuefrom Cn instead of bits 13-8 (6 bits) from Acc (the selection is basedon the value of Logic₂). Bit 5 is required to distinguish ROR from RIA.Logic₂ Cmd⁵⁻² = 0x10

[8463] TABLE 377 Selection for multiplexor MX₁ Output Logic₂ MX₁ Cn 1Acc¹³⁻⁸ 0

[8464] Multiplexor MX₂ performs the task of selecting the 8-bit valuefrom InByte instead of the lower 8 bits from the ANDed Acc based on theCMD. TABLE 378 Selection for multiplexor MX₂ Output Cmd⁴⁻⁰ MX₂ InByte0x110 Acc⁷⁻⁰

(0x110)

[8465] Multiplexor MX₃ does the final rotating of the 32-bit value. Thebit patterns of the CMD operand are taken advantage of: TABLE 379Selection for multiplexor MX₃ Output Cmd³⁻⁰ Comments MX₃ ROR 1 00xx RB,XRB, WriteMask, 1 ROR 3 010x 3 ROR 31 0110 31 ROR 24 0111 24 ROR 8 1xxxRIA, InByte, 8, OutByte, C1, C2, ID

[8466] 14.3 IO Block

[8467] The IO block within the ALU implements the logic forcommunicating with the IOU during instructions that involve theAccumulator. This includes generating appropriate control signals andfor generating the correct data for sending during writes to the IOU'sOutByte and LocalId registers.

[8468]FIG. 402 shows a block diagram of the IO block:

[8469] Logic₁ is used to provide the LocalIdWE signal to the IOU. ThelocalId register should only be written to during the ROR ID command.Only the lower 7 bits of the Accumulator are written to the localIdregister.

[8470] Logic₂ is used to provide the ALUOutByteWE signal to the IOU. TheOutByte register should only be written to during the ROR OutBytecommand. Only the lower 8 bits of the Accumulator are written to theOutByte register.

[8471] In both cases we output the lower 8 bits of the Accumulator. TheALUIOData value is ANDed with the output of Logic₂ to ensure thatALUIOData is only valid when it is safe to do so (thus the IOU logicnever sees the key passing by in ALUIOData). The combinatorial logicblocks are: Logic₁ Exec

(Cmd⁷⁻⁰ = ROR ID) Logic₂ Exec

(Cmd⁷⁻⁰ = ROR OutByte)

[8472] Logic₃ is used to provide the ALUInByteUsed signal to the IOU.The InByte is only used during the ROR InByte command. The combinatoriallogic is: Logic₃ Exec

(Cmd⁷⁻⁰ = ROR InByte)

[8473] 15 Program Counter Unit

[8474] The Program Counter Unit (PCU) includes the 12 bit PC (ProgramCounter), as well as logic for branching and subroutine control.

[8475] The PCU latches need to be parity-checked. In addition, the logicand registers contained in the PCU must be covered by both TamperDetection Lines to ensure that the PC cannot be changed by an attacker.

[8476] The PC is implemented as a 12 entry by 12-bit PCA (PC Array),indexed by a 4-bit SP (Stack Pointer) register. The PC, PCRamSel and SPregisters are all cleared to 0 on a RstL, and updated during the flow ofprogram control according to the opcodes.

[8477] The current value for the PC is normally updated during theExecute cycle according to the command being executed. However it isalso incremented by 1 during the Fetch cycle for two byte instructionssuch as JMP, JSR, DBR, TBR, and instructions that require an additionalbyte for immediate addressing. The mechanism for calculating the new PCvalue depends upon the opcode being processed.

[8478]FIG. 403 shows a block diagram of the PCU:

[8479] The ADD block is a simple adder modulo 2¹² with two inputs: anunsigned 12 bit number and an 8-bit signed number (high bit=sign). Thesigned input is either a constant of 0x01, or an 8-bit offset (the 8bits from the MIU).

[8480] The “+1.” block takes a 4-bit input and increments it by 1(modulo 12). The “−1.” block takes a 4-bit input and decrements it by 1(modulo 12).

[8481] Table 380 lists the different forms of PC control: TABLE 381Different forms of PC control during the Exec cycle Command Action JMPThe PC is loaded with the current 12-bit value as passed in from theMIU. JPI The PC is loaded with the current 12-bit value as passed infrom the Acc. PCRamSel is loaded with the value from bit 15 of the Acc.JPZ The PC is loaded with 0. PCRamSel is loaded with 0 (program inflash) JSZ Save old value of PC onto stack for later. The PC is loadedwith 0. PCRamSel is loaded with 0 (program in flash). JSR, JSI Save oldvalue of PC onto stack for later. The PC is loaded with the current12-bit value as passed in from either the MIU or the Acc. With JSI,PCRamSel is loaded from the value in bit 15 of the Accumulator. RTS Popold value of PC from stack and increment by 1 to get new PC. TBR If theZ flag matches the TBR test, add 8-bit signed number (MIU8Data) tocurrent PC. Otherwise increment current PC by 1. DBR If the CZ flag isset, add 8-bit signed offset (MIU8Data) to current PC. Otherwiseincrement current PC by 1. All others Increment current PC by 1

[8482] The updating of PCRamSel only occurs during JPI, JSI, JPZ and JSZinstructions, detected via Logic₀.

[8483] The same action for the Exec takes place for JMP, JSR, JPI, JSI,JPZ and JSZ, so we specifically detect that case in Logic₁. In the sameway, we test for the RTS case in Logic₂. Logic₀ Cmd⁷⁻¹ = 011x001 Logic₁(Cmd⁷⁻⁵ = 000)

Logic₀ Logic₂ Cmd⁷⁻⁰ = RTS

[8484] When updating the PC, we must decide if the PC is to be replacedby a completely new value (as in the case of the JMP, JSR, JPI, JSI, JPZand JSZ instructions), or by the result of the adder (all otherinstructions). The output from Logic₁ ANDed with Cycle can therefore besafely used by the multiplexor to obtain the new PC value (we need toalways select PC+1 when Cycle is 0, even though we don't always write itto the PCA).

[8485] Note that the JPZ and JSZ instructions are implemented as 12 ANDgates that cause the Accumulator value to be ignored, and the new PC tobe set to 0. Likewise, the PCRamSel bit is cleared via these twoinstructions using the same AND mechanism.

[8486] The input to the 12-bit adder depends on whether we areincrementing by 1 (the usual case), or adding the offset as read fromthe MIU (when a branch is taken by the DBR and TBR instructions). Logic₃generates the test. Logic₃ Cycle

(((Cmd⁷⁻⁴ = DBR)

CZ)

((Cmd⁷⁻⁴ = TBR)

(Cmd₀ ⊕ Z)))

[8487] The actual offset to be added in the case of the DBR and TBRinstructions is either the 8-bit value read from the MIU, or an 8-bitvalue generated by bits 3-1 of the opcode and treating bit 4 of theopcode as the sign (thereby making DBR immediate branching negative, andTBR immediate branching positive). The former is selected when bits 3-1of the opcode is 0, as shown by Logic₄. Logic₄ If (Cmd³⁻¹ = 000) outputMIU8Data Else output Cmd₄ | Cmd₄ | Cmd₄ | Cmd₄ | Cmd₄ | Cmd³⁻¹

[8488] Finally, the selection of which PC entry to use depends on thecurrent value for SP. As we enter a subroutine, the SP index value mustincrement, and as we return from a subroutine, the SP index value mustdecrement. Logic₁ tells us when a subroutine is being entered, andLogic₂ tells us when the subroutine is being returned from. We useLogic₂ to select the altered SP value, but only write to the SP registerwhen Exec and Cmd₄ are also set (to prevent JMP and JPZ from adjustingSP).

[8489] The two VAL units are validation units connected to the TamperPrevention and Detection circuitry (described in Section 10.3.5 on page962), each with an OK bit. The OK bit is set to 1 on PORstL, and ORedwith the ChipOK values from both Tamper Detection Lines each cycle. TheOK bit is ANDed with each data bit that passes through the unit. BothVAL units also parity-check the data bits to ensure that they are valid.If the parity-check fails, the Erase Tamper Detection Line is triggered.

[8490] In the case of VAL₁, the effective output from the SP registerwill always be 0. If the chip has been tampered with. This prevents anattacker from executing any subroutines.

[8491] In the case of VAL₂, the effective PC output will always be 0 ifthe chip has been tampered with. This prevents an attacker fromexecuting any program code.

[8492] 16 Address Generator Unit

[8493] The Address Generator Unit (AGU) generates effective addressesfor accessing the Memory Unit (MU). In Cycle 0, the PC is passed throughto the MU in order to fetch the next opcode. The AGU interprets thereturned opcode in order to generate the effective address for Cycle 1.In Cycle 1, the generated address is passed to the MU.

[8494] The logic and registers contained in the AGU must be covered byboth Tamper Detection Lines. This is to ensure that an attacker cannotalter any generated address. The latches for the counters and calculatedaddress should also be parity-checked.

[8495] If either of the Tamper Detection Lines is broken, the AGU willgenerate address 0 each cycle and all counters will be fixed at 0. Thiswill only come into effect if an attacker has disabled the RESET and/orerase circuitry, since under normal circumstances, breaking a TamperDetection Line will result in a RESET or the erasure of all Flashmemory.

[8496] 16.1 Implementation

[8497] The block diagram for the AGU is shown in FIG. 404:

[8498] The accessMode and WriteMask registers must be cleared to 0 onreset to ensure that no access to memory occurs at startup of the CPU.

[8499] The Adr and accessMode registers are written to during the finalcycle of cycle 0 (Fetch) and cycle 1 (Exec) with the address to useduring the following cycle phase. For example, when cycle=1, the PC isselected so that it can be written to Adr during Exec. During cycle 0,while the PC is being output from Adr, the address to be used in thefollowing cycle 1 is calculated (based on the fetched opcode seen asCmd) and finally stored in Adr when Fetch is 1. The accessMode registeris also updated in the same way.

[8500] It is important to distinguish between the value of Cmd duringdifferent values for Cycle:

[8501] During Cycle 0, when Fetch is 1, the 8-bit input Cmd holds theinstruction to be executed in the following Cycle 1. This 8-bit value isused to decode the effective address for the operand of the instruction.

[8502] During Cycle 1, when Exec is 1, Cmd holds the currently executinginstruction.

[8503] The WriteMask register is only ever written to during executionof an appropriate ROR instruction. Logic₁ sets the WriteMask and MMRWriteEnables respectively based on this condition: Logic₁ Exec

(Cmd⁷⁻⁰ = ROR WriteMask)

[8504] The data written to the WriteMask register is the lower 8 bits ofthe Accumulator.

[8505] The Address Register Unit is only updated by an RIA or LIAinstruction, so the writeEnable is generated by Logic₂ as follows:Logic₂ Exec

(Cmd⁶⁻³ = 1111)

[8506] The Counter Unit (CU) generates counters C1, C2 and the selectedN index. In addition, the CU outputs a CZ flag for use by the PCU. TheCU is described in more detail below.

[8507] The VAL₁ unit is a validation unit connected to the TamperPrevention and Detection circuitry (described in Section 10.3.5 on page962). It contains an OK bit that is set to 1 on PORstL, and ORed withthe ChipOK values from both Tamper Detection Lines each cycle. The OKbit is ANDed with the 12 bits of Adr before they can be used. If thechip has been tampered with, the address output will be always 0,thereby preventing an attacker from accessing other parts of memory. TheVAL₁ unit also performs a parity check on the Adr Address bits to ensureit has not been tampered with. If the parity-check fails, the EraseTamper Detection Line is triggered.

[8508] 16.1.1 Counter Unit

[8509] The Counter Unit (CU) generates counters C1 and C2 (usedinternally). In addition, the CU outputs Cn and flag CZ for useexternally. The block diagram for the CU is shown in FIG. 405: RegistersC1 and C2 are updated when they are the targets of a DBR, SC or RORinstruction. Logic₁ generates the control signals for the write enablesas shown in the following pseudocode. isDbrSc = (Cmd⁷⁻⁴ = DBR)

(Cmd⁷⁻⁴ = SC) isRorCn = (Cmd⁷⁻⁴ = ROR)

(Cmd³⁻² = 10) CnWE = Exec

(isDbrSc

isRorCn) C1we = CnWE

Cmd₀ C2we = CnWE

Cmd₀

[8510] The single bit flag CZ is produced by the NOR of the appropriateC1 or C2 register for use during a DBR instruction. Thus CZ is 1 if theappropriate Cn value=0.

[8511] The actual value written to C1 or C2 depends on whether the ROR,DBR or SC instruction is being executed. During a DBR instruction, thevalue of either C1 or C2 is decremented by 1 (with wrap). Onemultiplexor selects between the lower 6 bits of the Accumulator (for RORinstructions), and a 6-bit value for an SC instruction where the upper 3bits=the low 3 bits from C2, and low 3 bits=low 3 bits from Cmd. Notethat only the lowest 3 bits of the operand are written to C1.

[8512] The two VAL units are validation units connected to the TamperPrevention and Detection circuitry (described in Section 10.3.5 on page962), each with an OK bit. The OK bit is set to 1 on PORstL, and ORedwith the ChipOK values from both Tamper Detection Lines each cycle. TheOK bit is ANDed with each data bit that passes through the unit. All VALunits also parity check the data to ensure the counters have not beentampered with. If a parity check fails, the Erase Tamper Detection Lineis triggered.

[8513] In the case of VAL₁, the effective output from the counter C1will always be 0 if the chip has been tampered with. This prevents anattacker from executing any looping constructs.

[8514] In the case of VAL₂, the effective output from the counter C2will always be 0 if the chip has been tampered with. This prevents anattacker from executing any looping constructs.

[8515] 16.1.2 Calculate Next Address

[8516] This unit generates the address of the operand for the nextinstruction to be executed. It makes use of the Address Register Unitand PC to obtain base addresses, and the counters from the Counter Unitto assist in generating offsets from the base address.

[8517] This unit consists of some simple combinatorial logic, includingan adder that adds a 6-bit number to a 10-bit number. The logic is shownin the following pseudocode. isErase = (Cmd⁷⁻⁰ = ERA) isSt = (Cmd⁷⁻⁴ =ST) isAccRead = (Cmd⁷⁻⁶ = 10) # First determine whether this is animmediate mode requiring PC+1 isJmpJsrDbrTbrImmed = (Cmd⁷⁻⁶ = 00)

(

Cmd₅

(Cmd⁵⁻¹ = 1x000)) isLia = (Cmd⁷⁻³ = LIA) isLogImmed = ((Cmd⁷⁻⁶ = 11)

((Cmd₅

Cmd₄)

(Cmd⁵⁻³ ≠ 111)))

(Cmd¹⁻⁰ = 10) pcSel = Cycle

(

Cycle

(isJmpJsrDbrTbrImmed

isLogImmed

isLia)) # Generate AnSel signal for the Address Register Unit A0Sel =(isAccRead

isSt)

(

Cmd₃

(Cmd⁵⁻³ = 001)) AnSel¹⁻⁰ =

A0Sel

Cmd²⁻¹ # The next address is either the new PC or must be generated #(we require the base address from Address Register Unit) nextRAMSel =AnDataOut₈

isErase If (nextRAMSel) baseAdr = 00 | AnDataOut⁷⁻⁰ # ram addresses arealready word aligned Else baseAdr = AnDataOut⁷⁻⁰ | 00 # flash addressesare 4-byte aligned EndIf # Base address is now word (4-byte) aligned #Now generate the offset amount to be added to the base address selCn =(isAccRead

isSt)

(Cmd₅

Cmd₄)

Cmd₃ offset₀ = (A0Sel

Cmd₀)

(selCn

Cn₀) offset₁ = (A0Sel

Cmd₁)

(selCn

Cn₁) offset₂ = (A0Sel

Cmd₂)

(selCn

Cn₂) offset⁵⁻³ = selCn

Cn⁵⁻³ If (isErase) nextEffAdr¹¹⁻⁴ = Acc⁷⁻⁰ nextEffAdr³⁻⁰ = don't careElse # now we can simply add the offset to the base address to get theeffective adr nextEffAdr¹¹⁻² = baseAdr + offset # 10 bit plus 6 bit,with wrap = 10 bits out nextEffAdr¹⁻⁰ = 0 # word access, so lower bitsof effadr are 0 EndIf # Now generate the various signals for use duringCycle=1 # Note that these are only valid when pcSel is 0 (otherwise willread PC) nextAccessMode₀ = 1 # want 32-bit access nextAccessMode₁ =nextRAMSel # ram or flash access (only valid if rd/wr/erase set)nextAccessMode₂ = isAccRead # pcSel takes care of LIA instructionnextAccessMode₃ = isSt # write access nextAccessMode₄ = isErase # erasepage access

[8518] 16.1.3 Address Register Unit

[8519] This unit contains 4×9-bit registers that are optionally clearedto 0 on PORstL. The 2-bit input AnSel selects which of the 4 registersto output on DataOut. When the writeEnable is set, the AnSel selectswhich of the 4 registers is written to with the 9-bit DataIn.

[8520] 17 Program Mode Unit

[8521] The Program Mode Unit (PMU) is responsible for Program Mode andTrim Mode operations:

[8522] Program Mode involves erasing the existing flash memory andloading the new program/data into the flash. The program that is loadedcan be a bootstrap program if desired, and may contain additionalprogram code to produce a digital signature of the final program toverify that the program was written correctly (e.g. by producing a SHA-1signature of the entire flash memory).

[8523] Trim Mode involves counting the number of internal cycles thathave elapsed between the entry of Trim Mode (at the falling edge of theack) and the receipt of the next byte (at the falling edge of the lastbit before the ack) from the Master. When the byte is received, thecurrent count value divided by 2 is transmitted to the Master.

[8524] The PMU relies on a fuse (implemented as the value of word 0 ofthe flash information block) to determine whether it is allowed toperform Program Mode operations. The purpose of this fuse is to preventeasy (or accidental) reprogramming of QA Chips once their purpose hasbeen set. For example, an attacker may want to reuse chips from oldconsumables. If an attacker somehow bypasses the fuse check, the PMUwill still erase all of flash before storing the desired program. Evenif the attacker somehow disconnects the erasure logic, they will beunable to store a program in the flash due to the shadow nybbles.

[8525] The PMU contains an 8-bit buff register that is used to hold thebyte being written to flash and a 12-bit adr register that is used tohold the byte address currently being written to.

[8526] The PMU is also used to load word 1 of the information block intoa 32-bit register (combined from 8-bits of buff, 12-bits of adr, and afurther 12-bit register) so it can be used to XOR all data to and frommemory (both Flash and RAM) for future CPU accesses. This logic isactivated only when the chip enters ActiveMode (so as not to accessflash and possibly cause an erasure directly after manufacture sinceshadows will not be correct). The logic and 32-bit mask register is inthe PMU to minimize chip area.

[8527] The PMU therefore has an asymmetric access to flash memory:

[8528] writes are to main memory

[8529] reads are from information block memory

[8530] The reads and writes are automatically directed appropriately inthe MRU.

[8531] A block diagram of the PMU is shown in FIG. 406.

[8532] 17.1 Local Storage and Counters

[8533] The PMU keeps a 1-cycle delayed version of MRURdy, calledprevMRURdy. It is used to generate PMNewTrans. Therefore each cycle thePMU performs the following task:

[8534] prevMRURdy←MRURdy v(state=loadByte)

[8535] The PMU also requires 1-bit maskLoaded, idlePending andidlePending registers, all of which are cleared to 0 on RstL. The 1-bitfuseBlown register is set to 1 on RstL for security.

[8536] 17.2 State Machine

[8537] The state machine for the PMU is shown in FIG. 407, with thepseudocode for the various states outlined below. rstl prevMRURdy,maskLoaded, idlePending, adr

0 #clear most regs fuseBlown

1 # for security sake assume the worst state

idle

[8538] The idle state, entered after reset, simply waits for the IOModeto enter ProgramMode, ActiveMode, or TrimMode. Note that the reset valuefor fuseBlown means that ProgramMode and TrimMode cannot be entereduntil after a successful entry into ActiveMode that also clears thefuseBlown register. In state idle, PMEn=

maskLoaded, and in state wait4Mode PMEn=0. In all other states, PMEn=1.idle idlePending

0 PMEn =

maskLoaded PMNewTrans = 0 If ((IOMode = ActiveMode)

MRURdy) If (maskLoaded) state

wait4mode # no need to reload mask once loaded Else adr

0 #the location of the fuse is within 32-bit word 0 state

loadFuse EndIf ElseIf ((IOMode = ProgramMode)

MRURdy {circumflex over ( )}

fuseBlown) # wait 4 access 2 finish maskLoaded

0 # the mask is now invalid adr

0 # the location of the fuse is within 32-bit word 0 state

loadFuse ElseIf ((IOMode = TrimMode)

MRURdy

fuseBlown) # wait 4 access 2 finish maskLoaded

# the mask is now invalid adr

#start the counter on entering TrimMode state

trim Else state

idle EndIf

[8539] The wait4mode state simply waits until for the current mode tofinish and returns to idle. wait4mode PMEn = 0 PMNewTrans = 0 If (IOMode= IdleMode) state

idle Else state

wait4mode EndIf

[8540] The trim state is where we count the number of cycles between theentry of the Trim Mode and the arrival of a byte from the Master. Whenthe byte arrives from the Master, we send the resultant count: trim # Wesaturate the adder at all 1s to make external trim control easierlastOne = adr₀

adr₁

... adr₁₁ If (

lastOne) adr = adr + 1 # 12 bit incrementor EndIf # This logic simplycauses the current adder value to be written to the # outByte when theinByte is received. The inByte is cleared when received # although it isnot strictly necessary to do so PMOutByteWE = InByteValid # 0 in allother states PMInByteUsed = InByteValid # same as in loadByte state, 0in all other states If (IOMode ≠ TrimMode) state

idle ElseIf (InByteValid) state

wait4mode Else state

trim EndIf

[8541] The loadFuse state is called whenever there is an attempt toprogram the device or we are entering ActiveMode and the mask is invalid(i.e. after power up or after a ProgramMode or TrimMode command). Weload the 32-bit fuse value from word 0 of information memory in flashand compare it against the FuseSig constant (0x5555AAAA) to obtain thefuse value. The next state depends on IOMode and the Fuse. loadFuse PMEn= 1 PMNewTrans = prevMRURdy idlePending_in = idlePending

(IOMode = IdleMode) idlePending

idlePending_in If (MRURdy) If (idlePending_in) # don't change stateuntil the memory access is complete state

idle Else fuseBlown_in = (MRUData³¹⁻⁰ = FuseSig) fuseBlown

fuseBlown_in If (IOMode = ProgramMode) If (fuseBlown_in) state

wait4mode # not allowed to program anymore Else state

erase EndIf ElsIf (IOMode = ActiveMode) adr

4 # byte 4 is word 1 (the location of the XORMask) state

getMask Else state

idle EndIf EndIf Else state

loadFuse EndIf

[8542] The erase state erases the flash memory and then leads into themain programming states: erase PMNewTrans = prevMRURdy PMEraseDevice = 1# is 0 in all other states adr

0 idlePending_in = idlePending

(IOMode ≠ ProgramMode) idlePending

idlePending_in If (MRURdy) If (idlePending_in) state

idle Else state

loadByte EndIf Else state

erase EndIf

[8543] Program Mode involves loading a series of 8-bit data values intothe Flash. The PMU reads bytes via the IOU's InByte and InByteValID,setting MUInByteUsed as it loads data. The Master must send dataslightly slower than the speed it takes to write to Flash to ensure thatdata is not lost. loadByte # Load in 1 byte (1 word) from IO UnitPMNewTrans = 0 PMInByteUsed = InByteValid # same as in TrimIn state, and0 in all other states If (IOMode ≠ ProgramMode) state

idle Else If (InByteValid) buff

InByte state

writeByte Else state

loadByte EndIf EndIf writeByte PMNewTrans = prevMRURdy PMRW = 0 # write.In all other states, PMRW = 1 (read) PM320ut⁷⁻⁰ = buff # data (can betied to this) PM320ut¹⁹⁻⁸ = adr # can be tied to this PM320ut³¹⁻²⁰ =12bitReg # is always this (is don't care during a write) idlePending_in= idlePending

(IOMode ≠ ProgramMode) idlePending

idlePending_in If (MRURdy) lastOne = adr₀

adr₁

... adr₁₁ adr

adr + 1 # 12 bit incrementor If (idlePending_in) state

idle ElseIf (lastOne) state

wait4Mode Else state

loadByte EndIf Else state

writeByte EndIf

[8544] The getMask state loads up word 1 of the flash information block(bytes 4-7) into the 32-bit buffer so it can be used to XOR all data toand from memory (both Flash and RAM) for future CPU accesses. getMaskPMNewTrans = prevMRURdy PM32Out¹⁹⁻⁸ = adr # adr should = 4, i.e. word 1which holds the CPU's mask PMRW = 1 # read (MUST be 1 in this state)idlePending_in = idlePending

(IOMode ≠ ActiveMode) idlePending

idlePending in If (MRURdy) buff

MRUData⁷⁻⁰ adr

MRUData¹⁹⁻⁸ 12bitReg

MRUData³¹⁻²⁰ maskLoaded

1 If (idlePending in) state

idle Else state

wait4mode EndIf Else state

getMask EndIf

[8545] 18 Memory Request Unit

[8546] The Memory Request Unit (MRU) provides arbitration between PMUmemory requests and CPU-based memory requests.

[8547] The arbitration is straightforward: if the input PMEn isasserted, then PMU inputs are processed and CPU inputs are ignored. IfPMEn is deasserted, the reverse is true.

[8548] A block diagram of the MRU is shown in FIG. 408.

[8549] 18.1 Arbitration Logic

[8550] The arbitration logic block provides arbitration between theaccesses of the PM and the {fraction (8/32)}-bit accesses of the CPU viaa simple multiplexing mechanism based on PMEn: ReqDataOut³¹⁻⁸ =CPUDataOut³¹⁻⁸ If (PMEn) NewTrans = PMNewTrans AccessMode₀ = PMRW # mapsto 1 for reads (32 bits), 0 for writes (8 bits) AccessMode₁ = 0 # flashaccesses only AccessMode₂ = PMRW

PMEraseDevice # read has lower priority than erase AccessMode₃ =

PMRW

PMEraseDevice # write has lower priority than erase AccessMode₄ = 0 #pageErase AccessMode₅ = PMEraseDevice # erase everything (main & infoblock) WriteMask = 0xFF Adr = PM32Out¹⁹⁻⁸ ReqDataOut⁷⁻⁰ = PM32Out⁷⁻⁰Else NewTrans = CPUNewTrans

(CPUAccessMode⁴⁻² ≠ 000) AccessMode⁴⁻⁰ = CPUAccessMode AccessMode₅ = 0 #cpu cannot ever erase entire chip WriteMask = CPUWriteMask Adr = CPUAdrReqDataOut⁷⁻⁰ = CPUDataOut⁷⁻⁰ EndIf

[8551] 18.2 Memory Request Logic

[8552] The Memory Request Logic in the MRU implements the memoryrequests from the selected input.

[8553] An individual request may involve outputting multiplesub-requests e.g. an 8-bit read consists of 2×4-bit reads (each flashbyte contains a nybble plus its inverse).

[8554] The input accessMode bits are interpreted as follows: TABLE 382Interpretation of accessMode bits Bit Description 0 0 = 8-bit access 1 =32-bit access 1 0 = flash access 1 = RAM access this bit is only validif bit 2, 3 or 4 is set 2 1 = read access 3 1 = write access 4 1 = erasepage access 5 1 = erase entire (info and main) flash (only used withinthe MRU)

[8555] The MRU contains the following registers for general purpose flowcontrol: TABLE 383 Description of register settings name #bitsDescription ActiveTrans 1 Is there a transaction still running? If so,then extraTrans and nextToXfer can be considered valid. badUntilRestart1 0 = memory (flash and ram) reads work correctly 1 = memory (flash andram) reads return 0 Gets set whenever illChip gets set, and remains setuntil a soft restart occurs i.e. IOMode passes through Idle. extraTrans1 Determines whether there is an additional sub- transaction to perform.e.g. a 32 bit read from flash involves 4 sub-transactions in the case of8-bit accesses, and 8 sub-transactions in the case of 4-bit accesses.IllChip 1 0 = 15 consecutive bad reads have not occurred 1 = 15consecutive bad reads have occurred nextToXfer 3 The next element (byteor nybble) number to transfer to/from memory restartPending 1 1 = IOModepassed through Idle while a transaction was being processed 0 = Thetransaction completed without IOMode passing through Idle retryCount 4Number of times that a byte has been read badly from flash. When a bytehas been read badly 15 consecutive times illChip will be set.retryStarted 1 0 = no retries encountered yet for this read 1 = retrieshave been encountered - retryCount holds the number of retries TheretryStarted register is used to stop retryCount being cleared on goodreads - thus keeping a record of the last number of retries on a badread.

[8556] Table 383 lists the registers specifically for testing flash.Although the complete set of flash test registers is in both the MRU andMAU (group 0 is in the MRU, groups 1 and 2 are in the MAU), all thedecoding takes place from the MRU. TABLE 383 Flash test registerssettable from CPU when the RAM address is > 128⁷adrbitSuperscriptparanumonly bits name description 0 0 shadowsOff 0 =regular shadowing (nybble based access to flash) 1 = shadowing disabled,8-bit direct accesses to flash. 1 hiFlashAdr Only valid when shadowsOff= 1 0 = accesses are to lower 4 Kbytes of flash 1 = accesses are toupper 4 Kbytes of flash 2 1 3 enableFlashTest 0 = keep flash testregister within the TSMC flash IP in its reset state 1 = enable flashtest register to take on non-reset values.  8-4 flashTest Internal 5-bitflash test register within the TSMC flash IP (SFC008_08B9_HE). If thisis written with 0x1E, then subsequent writes will be according to theTSMC write test mode. You must write a non-0x1E value or reset theregister to exit this mode. 2 28-9 flashTime When timerSel is 1, thisvalue is used for the duration of the program cycle within a standardflash write or erasure. 1 unit = 16 clock cycles (16 × 100 ns typical).Regardless of timerSel, this value is also used for the timeoutfollowing power down detection before the QA Chip resets itself. 1 unit= 1 clock cycle (=100 ns typical). Note that this means the programmershould set this to an appropriate value (e.g. 5 μs), just as the localIdneeds to be set. 29 timerSel 0 = use internal (default) timings forflash writes & erasures 1 = use flashTime for flash writes and erasures

[8557] 18.2.1 Reset

[8558] Initialization on reset involves clearing all the flags: MRURdy =0 # can't process anything at this point activeTrans

0 extraTrans

0 illChip

0 badUntilRestart

0 restartPending

0 retryCount

0 retryStarted

0 nextToXfer

0 # don't care shadowsOff

0 hiFlashAdr

0 infoBlockSel

0# used to generate MRUMode₂

[8559] 18.2.2 Main Logic

[8560] The main logic consists of waiting for a new transaction, andstarting an appropriate sub-transaction accordingly, as shown in thefollowing pseudocode: # Generate some basic signals for use indetermining accessPatterns Is32Bit = AccessMode₀ Is8Bit =

AccessMode₀ IsFlash =

AccessMode₁ IsRAM = AccessMode₁ IsRead = AccessMode₂ IsWrite =AccessMode₃ noShadows = shadowsOff doShadows = IsFlash

noShadows continueRequest = (IOMode ≠ IdleMode) okForTrans =

restartPending

continueRequest startOfSubTrans = (NewTrans

extraTrans)

okForTrans doingTrans = startOfSubTrans

(activeTrans

extraTrans) IsInvalidRAM = doingTrans

IsRAM

(Adr₉

(Adr₈

Adr₇)) IsTestModeWE = doingTrans

IsRAM

IsWrite

Adr₉ IsTestReg₀ = IsTestModeWE

Adr₃ #write to flash test register - bit 1 of word adr IsTestReg₁ =IsTestModeWE

Adr₄ #write to flash test register - bit 2 of word adr MRUTestWE =IsTestReg₀

IsTestReg₁ IsPageErase = AccessMode₄ IsDeviceErase = AccessMode₅

(IsTestModeWE

(Adr⁸⁻² = 0001000)) # bit 9 not req IsErase = IsDeviceErase

IsPageErase MRURAMSel = IsRAM

MRUTestWE

IsDeviceErase IsInfBlock = (PMEn

(IsDeviceErase

IsRead))

(

PMEn

infoBlockSel

(IsDeviceErase

(IsFlash

(Adr¹¹⁻⁷ = 0)

(Adr₆

doShadows)))) # Which element (byte or nybble) are we up to xferring? If(NewTrans) toXfer = 0 Else toXfer = nextToXfer EndIf # Form the addressthat goes to the outside world If (IsFlash

noShadows) byteCount = toXfer¹⁻⁰ MRUAdr₁₂ = hiFlashAdr # upper or lowerblock of 4Kbytes of flash MRUAdr¹¹⁻² = Adr¹¹⁻² # word # MRUAdr¹⁻⁰ =(Adr¹⁻⁰

(

Is32Bit|

Is32Bit))

byteCount # byte Else byteCount = toXfer²⁻¹ MRUAdr¹²⁻³ = Adr¹¹⁻² # word# MRUAdr²⁻¹ = (Adr¹⁻⁰

(

Is32Bit|

Is32Bit))

byteCount # byte MRUAdr₀ = toXfer₀ #nybble EndIf # Assuming a write, arewe allowed to write to this address? writeEn = SelectBit [WriteMask,((MRUAdr₂

doShadows)| MRUAdr¹⁻⁰)]# mux: 1 from 8 # Generate the 4-bit mask to beused for XORing during CPU access to flash baseMask = SelectNybble(PM32Out, MRUAdr²⁻⁰) # mux selects 4 bits of 32 If (PMEn) theMask = 0Else theMask = baseMask # we only use mask for CPU accesses to flashEndIf # Select a byte (and nybble) from the data for writes baseByte =SelectByte[ReqDataOut, byteCount] # mux: 8 bits from 32 baseNybble =SelectNybble [baseByte, toXfer₀] # mux: 4 bits from 8 outNybble =baseNybble ⊕ theMask # only used when nybble writing # Generate the dataon the output lines (doesn't matter for reads or erasures)MRUDataOut³¹⁻⁰ = ReqDataOut³¹⁻⁸ # effectively don't care for flashwrites If (doShadows) MRUDataOut₇ =

outNybble₃ MRUDataOut₆ = outNybble₃ MRUDataOut₅ =

outNybble₂ MRUDataOut₄ = outNybble₂ MRUDataOut₃ =

outNybble₁ MRUDataOut₂ = outNybble₁ MRUDataOut₁ =

outNybble₀ MRUDataOut₀ = outNybble₀ Else MRUDataOut⁷⁻⁰ = baseByte EndIf# Setup MRUMode allowTrans = IsRAM

IsRead

(IsWrite

writeEn)

IsErase If (doingTrans) MRUMode₂ = IsInfBlock MRUMode₁ = IsErase

IsTestReg₁ MRUMode₀ = IsDeviceErase

(

IsWrite

IsPageErase)

IsTestReg₀ MRUNewTrans = startOfSubTrans

allowTrans

(

IsInvalidRAM

MRUTestWE

IsDeviceErase) Else MRUMode²⁻⁰ = 001 # read (safe) MRUNewTrans = 0 EndIf# Generate the effective nybble read from flash (this may not be used).# When there is a shadowFault (non-erased memory and invalid shadows) weconsider # it a bad read when an 8-bit read, or when writeMask₀ is 0. #Note: we always substitute the upper nybble of WriteMask for thenon-valid data, # but only flag a read error if WriteMask₀ is also 1.When the data is erased, # we return 0 regardless of WriteMask₀.finishedTrans = doingTrans

MAURdy finishedFlashSubTrans = finishedTrans

IsFlash

IsErase isWrittenFlash = (FlashData⁷⁻⁰ ≠ 11111111) # flash is erased toall 1s If (isWrittenFlash

((FlashData_(7,5,3,1) ⊕ FlashData_(6,4,2,0)) ≠ 1111)) inNybble³⁻⁰ =WriteMask⁷⁻⁴ badRead = finishedFlashSubTrans

IsRead

(Is8Bit

WriteMask₀)

doShadows Else inNybble_(3,2,1,0) = (theMask_(3,2,1,0) ⊕FlashData_(6,4,2,0))

isWrittenFlash badRead = 0 EndIf # Present the resultant data to theoutside world MaskTheData = IsInvalidRAM

badRead

(badUntilRestart

IsRAM) NoData = IsErase

IsWrite

doingTrans If (NoData

MaskTheData) MRUData₀ = IsInvalidRAM

illChip MRUData⁴⁻¹ = retryCount

(IsInvalidRAM

Adr₂) # mask all 4 count bits MRUData³¹⁻⁵ = 0 # also ensures a read thatis bad returns 0 ElseIf (IsRAM) MRUData³¹⁻²⁴ = SelectByte[RAMData,(Adr¹⁻⁰

Is32Bit|Is32Bit)] # mux: 8 from 32 MRUData²³⁻⁰ = RAMData²³⁻⁰ # lsbsremain unchanged from RAM ElseIf (doShadows) MRUData³¹⁻²⁸ = inNybbleMRUData²⁷⁻⁰ = buff²⁷⁻⁰ Else MRUData³¹⁻²⁴ = FlashData MRUData²³⁻⁰ =buff²⁷⁻⁴ EndIf # Shift in the data for the good reads - either 4 or 8bits (writes = don't care) If (finishedFlashSubTrans

badRead) buff³⁻⁰

buff⁷⁻⁴ # shift right 4 bits If (doShadows) buff²³⁻⁴

buff²⁷⁻⁸ # shift right 4 bits buff²⁷⁻²⁴

inNybble Else buff¹⁹⁻⁴

buff²⁷⁻¹² # shift right 8 bits, buff³⁻⁰ is don't care buff²⁷⁻²⁰

FlashData EndIf EndIf # Determine whether or not we need a newsub-transaction. We only need one if: # * there hasn't been a transitionto IdleMode during this transaction # * we're doing 8 bit reads that areshadowed # * we're doing 32 bit reads and we've done less than 4 or 8(sh vs non-sh) # * wegot a bad read from flash and we need to retry theread (jic was a glitch) moreAdrsToGo = (

toXfer₀

((Is8Bit

doShadows)

Is32Bit))

(

toXfer₁

Is32Bit)

(

toXfer₂

Is32Bit

doShadows) needToRetryRead = badRead

(

retryStarted

(retryCount ≠ 1111)) extraTrans_in = finishedFlashSubTrans

(moreAdrsToGo

needToRetryRead)

okForTrans nextToXfer

toXfer + (finishedFlashSubTrans

(IsWrite

needToRetryRead)) # generate our rdy signal and state values for nextcycle MRURdy =

doingTrans

(doingTrans

MAURdy

extraTrans_in) extraTrans

extraTrans_in activeTrans

MRURdy # all complete only when MRURdy is set # Take account of badreads triedEnough = badRead

retryStarted

(retryCount = 1111) If (MAURdy) If (IsTestModeWE

(Adr⁵⁻² = 0000)) # capture writes to local regs illChip

0 retryCount

0 Else illChip

illChip

triedEnough If (badRead) retryCount

(retryCount

retryStarted) + 1 # AND all 4 bits retryStarted

1 Else retryStarted

0# clear flag so will be ok for the next read EndIf EndIf EndIf # Ensurethat we won't have problems restarting a program If {MRURdy

okForTrans) # note MRURdy (may not be running a transaction!)shadowsOff, hiFlashAdr, infoBlockSel, restartPending, badUntilRestart

0 Else badUntilRestart

badUntilRestart

triedEnough If (doingTrans

ContinueRequest) restartPending

1 # record for later use EndIf If (IsTestModeWE

Adr₂) # the other writes are taken care of by the MAU shadowsOff

ReqDataOut₀ hiFlashAdr

ReqDataOut₁ infoBlockSel

ReqDataOut₂ EndIf EndIf

[8561] 19 Memory Access Unit

[8562] The Memory Access Unit (MAU) takes memory access control signalsand turns them into RAM accesses and flash access strobed signals withappropriate duration.

[8563] A new transaction is given by MRUNewTrans. The address to be readfrom or written to is on MRUAdr, which is a nybble-based address. TheMRUAdr (13-bits) is used as-is for Flash addressing. When MRURAMSel=1,then the RAM address (RAMAdr) is taken from bits 9-3 of MRUAdr. The datato be written is on MRUData.

[8564] The return value MAURdy is set when the MAU is capable ofreceiving a new transaction the following cycle. Thus MAURdy will be 1during the final cycle of a flash or ram access, and should be 1 whenthe MAU is idle. MAURdy should only be 0 during startup or when atransaction has yet to finish.

[8565] When MRURAMSel=1, the access is to RAM, and MRUMode has thefollowing interpretation: TABLE 384 Interpretation of MRUMode¹⁰ for RAMaccesses bits action xx0 doWrite xx1 doRead

[8566] When MRUFAMSel=0, the access is to flash. If MRUTestWE=0, thenthe access is to regular flash memory, as give by MRUMode: TABLE 385Interpretation of MRUMode for regular flash accesses¹¹ bits 1-0 actionwhen MRUMode₂=0 action when MRUMode₂=1 00 doWrite (main memory) doWrite(info block) 01 doRead (main memory) doRead (info block) 10 doErasePage(main doErasePage (info block) memory) 11 doEraseDevice (maindoEraseDevice (both memory) blocks)

[8567] If MRUTestWE is 1, then MRUMode₂ will also be 0, and the accessis to a flash test register, as given by MRUMode: TABLE 386Interpretation of MRUMode for flash test register write accesses bits¹²action xx1 If (MRUData₃ = 0), tie the flash IP test register to itsreset state If (MRUData₃ = 1), take the flash IP test register out ofreset state, and write MRUData⁸⁻⁴ to the 5-bit flash test registerwithin the flash IP (SFC008_08B9_HE) x1x Write MRUData²⁸⁻⁹ to theinternal 20- bit alternate-counter-source register flashTime, andMRUData₂₉ to the corres- ponding 1-bit test register timerSel.

[8568] 19.1 Implementation

[8569] The MAU consists of logic that calculates MAURdy, and additionallogic that produces the various strobed signals according to the TSMCFlash memory SFC0008_(—)08B9_HE; refer to this datasheet [4] fordetailed timing diagrams. Both main memory and information blocks can beaccessed in the Flash. The Flash test modes are also supported asdescribed in [5] and general application information is given in [6].

[8570] The MAU can be considered to be a RAM control block and a flashcontrol block, with appropriate action selected by MRURAMSel. For allmodes except read, the Flash requires wait states (which are implementedwith a single counter) during which it is possible to access the RAM.Only 1 transaction may be pending while waiting for the wait states toexpire. Multiple bytes may be written to Flash without exiting the writemode.

[8571] The MAU ensures that only valid control sequences meeting thetiming requirements of the Flash memory are provided. A write time-outis included which ensures the Flash cannot be left in write modeindefinitely; this is used when the Flash is programmed via the IO Unitto ensure the X address does not change while in write mode. Otherwise,other units should ensure that when writing bytes to Flash, the Xaddress does not change. The X address is held constant by the MAUduring write and page erase modes to protect the Flash. If an X addresschange is detected by the MAU during a Flash write sequence, it willexit write mode allowing the X address to change and re-enter writemode. Thus, the data will still be written to Flash but it will takelonger.

[8572] When either the Flash or RAM is not being used, the MAU sets thecontrol signals to put the particular memory type into standby tominimise power consumption.

[8573] The MAU assumes no new transactions can start while one is inprogress and all inputs must remain constant until MAU is ready.

[8574] 19.2 Flash Test Mode

[8575] MAU also enables the Flash test mode register to be programmedwhich allows various production tests to be carried out. If MRUTestWE=1,transactions are directed towards the test mode register. Most of thetests use the same control sequences that are used for normal operationexcept that one time value needs to be changed. This is provided by theflashTime register that can be written to by the CPU allowing the timerto be set to a range of values up to more than 1 second. A specialcontrol sequence is generated when the test mode register is set to 0x1Eand is initiated by writing to the Flash.

[8576] Note that on reset, timeSel and flashTime are both cleared to 0.The 5-bit flash test register within the TSMC flash IP is also reset bysetting TMR=1. When MRUTestWE=1, any open write sequence is closed evenif the write is not to the 5-bit flash test register within the TSMCflash IP.

[8577] 19.3 Flash Power Failure Protection

[8578] Power could fail at any time; the most serious consequence wouldbe if this occurred during writing to the Flash and data becamecorrupted in another location to that being written to. The MAU willprotect the Flash by switching off the charge pump (high voltage supplyused for programming and erasing) as soon as the power starts to fail.After a time delay of about 5 μs (programmable), to allow the dischargeof the charge pump, the QA chip will be reset whether or not the powersupply recovers.

[8579] 19.4 Flash Access State Machine

[8580] 19.5 Interface TABLE 387 MAU interface description Signal nameI/O Description Clk In System clock. RstL In System reset (active low).MAURAMEn In Flag indicating whether the external user needs access tothe RAM at a gross level (e.g. the CPU is active and therefore may wantRAM access). 1 = wants access available, 0 = don't want. MRUNewTrans InFlag indicating MRU wishes to start a new transaction. May only beasserted (=1) when MAURdy = 1. All inputs below must be held constantuntil MAU is ready. MRURAMSel In 1 = RAM, 0 = Flash. MRUMode2-0 In Typeof transaction to be performed. MRUAdr12-0 In Memory address from theMRU. MRUDataOut31-0 In Data used to control and set test modes andtiming. MRUTestWE In Flag indicating test mode transactions. PwrFailingIn Flag indicating possible power failure in progress. MAURdy Out TheMAU is ready when MAURdy = 1. It is always set for RAM transactions andheld low during Flash wait states. RAMOutEn Out 0 = enable the RAM toread or write this cycle (i.e. active low) 1 = disable the RAM thiscycle (saves power, memory is intact) RAMWE Out RAM write when RAMWE = 0(Artisan Synchronous SRAM). MemClk Out Inverted system clock to the RAM(required to meet timing). FlashCtrl8-0 Out Control signals to theFlash. IFREN = information block enable, not used always = 0 XE = Xaddress enable YE = Y address enable SE = sense amplifier enable (readonly) OE = output enable (read only), hi-Z when OE = 0 PROG = program(write bytes) NVSTR = enables all write and erase modes ERASE = pageerase mode MAS1 = mass erase mode TMR Out TMR = Register reset for testmode RAMAdr6-0 Out RAM address in the range 0 to 95. FlashAdr12-0 OutFlash address, full range. MAURstOutL Out Activates the global reset,RstL.

[8581] 19.6 Calculation of Timer Values

[8582] Set and calculate timer initialisation values based on Flash datasheet values, clock period and clock range. # Note: Flash data sheetgives minimum timings # Delays greater than 1 clock cycle clock_per =100 # ns Flash_Tnvs = 7500 # ns Flash_Tnvh = 7500 # ns Flash_Tnvh1 = 150# us Flash_Tpgs = 15 # us Flash_Tpgh = 100 # ns Flash_Tprog = 30 # usFlash_Tads = 100 # ns Flash_Tadh = 30 # us # Byte write timeoutFlash_Trcv = 1500 # ns Flash_Thv = 6 # ms # Not currently usedFlash_Terase = 30 # ms Flash_Tme = 300 # ms # Derive maximum counts (−1since state machine is synchronous) FLASH_NVS = Flash_Tnvs/clock_per − 1FLASH_NVH = Flash_Tnvh/clock_per − 1 FLASH_NVH1 =Flash_Tnvh1*1000/clock_per − 1 FLASH_PGS = Flash_Tpgs*1000/clock per − 1FLASH_PGH = Flash_Tpgh/clock_per − 1 FLASH_PROG =Flash_Tprog*1000/clock_per − 1 FLASH_ADS = Flash_Tads/clock_per − 1FLASH_ADH = Flash_Tadh*1000/clock_per − 1 FLASH_ADH_AND_WRITE_PGH =FLASH_ADH + FLASH_PGH + 1 # note is +1 FLASH_RCV = Flash_Trcv/clock_per− 1 FLASH_HV = Flash_Thv*1000000/clock_per − 1 FLASH_ERASE =Flash_Terase*1000000/clock_per − 1 FLASH_ME =Flash_Tme*1000000/clock_per − 1 count_size = 24 # Number of bits intimer counter (newCount) determined by Tme

[8583] 19.7 Defaults

[8584] Defaults to use when no action is specified. FlashTransPendingSet= 0 FlashTransPendingReset = 0 TMRSet = 0 TMRRst = 0 STLESet = 0 STLERst= 0 TestTimeEn = 0 IFREN = FlashXadr₇ XE = 0 YE = 0 SE = 0 OE = 0 PROG =0 NVSTR = 0 ERASE = 0 MAS1 = 0 MAURstOutL = 1 If (accessCount ≠ 0)newCount =accessCount − 1 # decrement unless instructed otherwise ElsenewCount = 0 EndIf

[8585] 19.8 Reset

[8586] Initialise state and counter registers. # asynchronous reset(active low) state

idle accessCount

1 countZ

0 XadrReg

0 FlashTransPending

0 TestTime

0 TMR

1 STLEFlag

0

[8587] 19.9 State Machine

[8588] The state machine generates sequences of timed waveforms tocontrol the operation of the Flash memory. idle FlashTransPendingReset =1 If (somethingToDo) # Flash starting conditions If (MRUTestWE)nextState = TM0 Else Switch (MRUModeint) Case doWrite: nextState =writeNVS newCount = FLASH_NVS Case doRead: YE = 1 SE = 1 OE = 1 XE = 1nextState = idle Case doErasePage: nextState = pageErase newCount =FLASH_NVS Case doEraseDevice: nextState = massErase newCount = FLASH_NVSEndSwitch EndIf EndIf

[8589] 19.9.1 Flash Page Erase

[8590] The following pseducocode illustrates the Flash page erasesequence. pageErase ERASE = 1 XE = 1 If (

PwrFailing) If (countZ) newCount = FLASH_ERASE nextState =pageEraseERASE EndIf Else newCount = TestTime¹⁹⁻⁰ nextState = Help1EndIf pageEraseERASE ERASE = 1 NVSTR = 1 XE = 1 If (

PwrFailing) If (countZ) newCount = FLASH_NVH nextState = pageEraseNVHEndIf Else newCount = TestTime¹⁹⁻⁰ nextState = Help1 EndIf pageEraseNVHNVSTR = 1 XE = 1 If (

PwrFailing) If (countZ) newCount = FLASH_RCV nextState = RCVPM EndIfElse newCount = TestTime¹⁹⁻⁰ nextState = Help1 EndIf RCVPM If (countZ)nextState = idle # exit EndIf

[8591] 19.9.2 Flash Mass Erase

[8592] The following pseducocode illustrates the Flash mass erasesequence. massErase MAS1 = 1 ERASE = 1 XE = 1 If (countZ) If (

TestTime₂₀) newCount = FLASH_ME Else newCount = TestTime¹⁹⁻⁰ | 0000EndIf nextState = massEraseME EndIf massEraseME MAS1 = 1 ERASE = 1 NVSTR= 1 XE = 1 If (countZ) newCount = FLASH_NVH1 nextState = massEraseNVH1EndIf massEraseNVH1 MAS1 = 1 NVSTR = 1 XE = 1 If (countZ) newCount =FLASH_RCV nextState = RCVPM EndIf

[8593] 19.9.3 Flash Byte Write

[8594] The following pseducocode illustrates the Flash byte writesequence. writeNVS PROG = 1 XE = 1 If (

PwrFailing) If (countZ) If (

STLEFlag) newCount = FLASH_PGS nextState = writePGS Else newCount =TestTime¹⁹⁻⁰ | 0000 nextState = STLE0 EndIf EndIf Else newCount =TestTime¹⁹⁻⁰ nextState = Help1 EndIf writePGS PROG = 1 NVSTR = 1 XE = 1If (

PwrFailing) If (countZ) newCount = FLASH_ADS nextState = writeADS EndIfElse newCount = TestTime¹⁹⁻⁰ nextState = Help1 EndIf writeADS # Add Tadsto Tpgs PROG = 1 NVSTR = 1 XE = 1 FlashTransPendingReset = 1 If (

PwrFailing) If (countZ) If (

TestTime₂₀) newCount = FLASH_PROG Else newCount = TestTime¹⁹⁻⁰ | 0000EndIf nextState = writePROG EndIf Else newCount = TestTime¹⁹⁻⁰ nextState= Help1 EndIf writePROG PROG = 1 NVSTR = 1 YE = 1 XE = 1 If (

PwrFailing) If (countZ) newCount = FLASH_ADH_AND_WRITE_PGH nextState =writeADH EndIf Else newCount = TestTime¹⁹⁻⁰ nextState = Help2 EndIfwriteADH PROG = 1 NVSTR = 1 XE = 1 FlashTransPendingSet = somethingToDoIf (

PwrFailing) If (

FlashNewTrans) If (countZ) -- Gracefull exit after timeout newCount =FLASH_NVH nextState = writeNVH EndIf Else # -- Do something as there isa new transaction If ((MRUModeint = doWrite)

(

XadrCh)) newCount = FLASH_ADS -- Write another byte nextState = writeADSElse newCount = FLASH_NVH -- Exit as new trans is not Flash writenextState = writeNVH EndIf EndIf Else newCount = TestTime¹⁹⁻⁰ nextState= Help1 EndIf writeNVH NVSTR = 1 XE = 1 FlashTransPendingSet =somethingToDo If (

PwrFailing) If (countZ) newCount = FLASH_RCV nextState = RCV EndIf ElsenewCount = TestTime¹⁹⁻⁰ nextState = Help1 EndIf RCV # wait til we'reallowed to do another transaction FlashTransPendingSet = somethingToDoIf (countZ) nextState = idle EndIf

[8595] 19.9.4 Test Mode Sequence

[8596] The following pseducocode illustrates the test mode sequence. TM0# Needed this due to delay on TMR IFREN = 0 nextState = idle # defaultIf ( MRUModeint₁) TestTimeEn = 1 EndIf If (MRUModeint₀) If (

MRUDataOut₃) TMRSet = 1 STLERst = 1 # Reset flag as leaving test modeElse If (MRUDataOut⁸⁻⁴= 11110) STLESet = 1 Else STLERst = 1 EndIf TMRRst= 1 nextState = TM1 # Will get priority EndIf EndIf TM1 IFREN = 0nextState = TM2 TM2 NVSTR = 1 SE = 1 IFREN = 0 nextState = TM3 TM3 NVSTR= 1 SE = 1 MAS1 = MRUDataOut₄ IFREN = MRUDataOut₅ XE = MRUDataOut₆ YE =MRUDataOut₇ ERASE = MRUDataOut₈ TMRSet = 1 nextState = TM4 TM4 NVSTR = 1SE = 1 MAS1 = MRUDataOut₄ IFREN = MRUDataOut₅ XE = MRUDataOut₆ YE =MRUDataOut₇ ERASE = MRUDataOut₈ TMRRst = 1 nextState = TM5 TM5 NVSTR = 1SE = 1 MAS1 = MRUDataOut₄ IFREN = MRUDataOut₅ XE = MRUDataOut₆ YE =MRUDataOut₇ ERASE = MRUDataOut₈ nextState = TM6 TM6 NVSTR = 1 SE = 1nextState = idle

[8597] 19.9.5 Reverse Tunneling and Thin Oxide Leak Test

[8598] The following pseducocode shows the reverse tunneling and thinoxide leak test sequence. STLE0 XE = 1 PROG = 1 NVSTR = 1 If (countZ)newCount = FLASH_NVH nextState = STLE1 EndIf STLE1 XE = 1 NVSTR = 1 If(countZ) newCount = FLASH_RCV nextState = STLE2 EndIf STLE2 If (countZ)nextState = idle EndIf

[8599] 19.9.6 Emergency Instructions

[8600] The following pseducocode shows the states used for emergencysituations such as when power is failing. Help1 # MAURdy −> 0 to holdMAU inputs constant, if not too late XE = 1 If (countZ) nextState =Goodbye EndIf Help2 # MAURdy −> 0 to hold MAU inputs constant, if nottoo late XE = 1 YE = 1 If (countZ) nextState = Goodbye EndIf Goodbye XE= 1 # Prevents Flash timing violation MAURstOutL = 0 # Reset whole chipwhether power fails # nothing else to do or recovers

[8601] 20 Analogue Unit

[8602] This section specifies the mandatory blocks of Section 11.1 onpage 965 in a way which allows some freedom in the detailedimplementation.

[8603] Circuits need to operate over the temperature range −40° C. to+125° C.

[8604] The unit provides power on reset, protection of the Flash memoryagainst erroneous writes during power down (in conjunction with the MAU)and the system clock SysClk.

[8605] 20.1 Voltage Budget

[8606] The table below shows the key thresholds for V_(DD) which definethe requirements for power on reset and normal operation. TABLE 388V_(DD) limits VDD parameter Description Voltage VDDFTmax Flash testmaximum 3.6¹³ VDDFTtyp Flash test typical 3.3 VDDFTmin Flash testminimum 3.0 VDDmax Normal operation maximum (typ + 10%) 2.75¹⁴ VDDtypNormal operation typical 2.5 VDDmin Normal operation minimum (typ − 5%)2.375 VDDPORmax Power on reset maximum 2.0¹⁵

[8607] 20.2 Voltage Reference

[8608] This circuit generates a stable voltage that is approximatelyindependent of PVT (process, voltage, temperature) and will typically beimplemented as a bandgap. Usually, a startup circuit is required toavoid the stable V_(bg)=0 condition. The design should aim to minimisethe additional voltage above V_(bg) required for the circuit to operate.An additional output, BGOn, will be provided and asserted when thebandgap has started and indicates to other blocks that the outputvoltage is stable and may be used. TABLE 389 Bandgap target performanceParameter Conditions Min Typ Max Units Vbg¹⁶ typical 1.2 1.23 1.26 V IDDtypical 50 μA Vstart worst case 1.6 V lout 10 nA Vtemp +0.1 mV/oC

[8609] 20.3 Power Detection Unit

[8610] Only under voltage detection will be described and is required toprovide two outputs:

[8611] underL controls the power on reset; and

[8612] PwrFailing indicates possible failure of the power supply.

[8613] Both signals are derived by comparing scaled versions of V_(DD)against the reference voltage V_(bg).

[8614] 20.3.1 V_(DD) Monotonicity

[8615] The rising and falling edges of V_(DD) (from the external powersupply) shall be monotonic in order to guarantee correct operation ofpower on reset and power failing detection. Random noise may be presentbut should have a peak to peak amplitude of less than the hysteresis ofthe comparators used for detection in the PDU.

[8616] 20.3.2 Under Voltage Detection Unit

[8617] The underL signal generates the global reset to the logic whichshould be de-asserted when the supply voltage is high enough for thelogic and analogue circuits to operate. Since the logic reset isasynchronous, it is not necessary to ensure the clock is active beforereleasing the reset or to include any delay.

[8618] The QA chip logic will start immediately the power on reset isreleased so this should only be done when the conditions of supplyvoltage and clock frequency are within limits for the correct operationof the logic.

[8619] The power on reset signal shall not be triggered by narrow spikes(<100 ns) on the power supply. Some immunity should be provided to powersupply glitches although since the QA chip may be under attack, anyreset delay should be kept short. The unit should not be triggered bylogic dynamic current spikes resulting in short voltage spikes due tobond wire and package inductance. On the rising edge of V_(DD), themaximum threshold for de-asserting the signal shall be whenV_(DD)>V_(DDmin). On the falling edge of V_(DD), the minimum thresholdfor asserting the signal shall be V_(DD)<V_(DDPORmax).

[8620] The reset signal must be held low long enough (T_(pwmin)) toensure all flip-flops are reset. The standard cell data sheet [7] givesa figure of 0.73 ns for the minimum width of the reset pulse for allflip-flop types.

[8621] 2 bits of trimming (trim₁₋₀) will be provided to take up all ofthe error in the bandgap voltage. This will only affect the assertion ofthe reset during power down since the power on default setting must beused during power up.

[8622] Although the reference voltage cannot be directly measured, it iscompared against V_(DD) in the PDU. The state of the power on resetsignal can be inferred by trying to communicate through the serial buswith the chIP. By polling the chip and slowly increasing V_(DD), a pointwill be reached where the power on reset is released allowing the serialbus to operate; this voltage should be recorded. As V_(DD) is lowered,it will cross the threshold which asserts the reset signal. The power ondefault is set to the lowest voltage that can be trimmed (which givesthe maximum hysterisis). This voltage should be recorded (or it may besufficient to estimate it from the reset release voltage recordedabove). V_(DD) is then increased above the reset release threshold andthe PDU trim adjusted to the setting the closest to V_(DDPORmax). V_(DD)should then be lowered and the threshold at which the reset isre-asserted confirmed. TABLE 390 Power on reset target performanceParameter Conditions Min Typ Max Units Vthrup T = 27o C. 2.0 2.375 VVthrdn T = 27o C. 2.0 2.1 V Vhystmin 16 mV IDD 5 μA Tspike 100 ns Vminr0.5 V Tpwmin 1 ns

[8623] Power on Reset Behaviour

[8624] The signal PwrFailing will be used to protect the Flash memory byturning off the charge pump during a write or page erase if the supplyvoltage drops below a certain threshold. The charge pump is expected totake about 5 us to discharge. The PwrFailing signal shall be protectedagainst narrow spikes (<100 ns) on the power supply.

[8625] The nominal threshold for asserting the signal needs to be in therange V_(PORmax)<V_(DDPFtyp)<V_(DDmin) so is chosen to be asserted whenV_(DD)<V_(DDPFtyp)=V_(DDPORmax)+200 mV. This infers a V_(DD) slew ratelimitation which must be <200 mV/5 us to ensure enough time to detectthat power is failing before the supply drops too low and the reset isactivated. This requirement must be met in the application by provisionof adequate supply decoupling or other means to control the rate ofdescent of V_(DD). TABLE 391 Power failing detection target performanceParameter Conditions Min Typ Max Units Vthr T = 27o C. 2.1 2.2 2.3 V¹⁷Vhyst 16 mV IDD 5 μA Tspike 100 ns Vminr 0.5 V

[8626] 2 bits of trimming (trim₁₋₀) will be provided to take up all ofthe error in the bandgap voltage.

[8627] 20.4 Ring Oscillator

[8628] SysClk is required to be in the range 7-14 MHz throughout thelifetime of the circuit provided V_(DD) is maintained within the rangeV_(DDMIN)<V_(DD)<V_(DDMAX). The 2:1 range is derived from theprogramming time requirements of the TSMC Flash memory. If this range isexceeded, the useful lifetime of the Flash may be reduced.

[8629] The first version of the QA chip, without physical protection,does not require the addition of random jitter to the clock. However, itis recommended that the ring oscillator be designed in such a way as toallow for the addition of jitter later on with minimal modification. Inthis way, the un-trimmed centre frequency would not be expected tochange.

[8630] The initial frequency error must be reduced to remain within therange 10 MHz/1.41 to 10 MHz×1.41 allowing for variation in:

[8631] voltage

[8632] temperature

[8633] ageing

[8634] added jitter

[8635] errors in frequency measurement and setting accuracy

[8636] The range budget must be partitioned between these variables.

[8637]FIG. 411._Ring oscillator block diagram

[8638] The above arrangement allows the oscillator centre frequency tobe trimmed since the bias current of the ring oscillator is controlledby the DAC. SysClk is derived by dividing the oscillator frequency by 5which makes the oscillator smaller and allows the duty cycle of theclock to be better controlled.

[8639] 20.4.1 DAC (Programmable Current Source)

[8640] Using V_(bg), this block sources a current that can be programmedby the Trim signal. 6 of the available 8 trim bits will be used(trim₇₋₂) giving a clock adjustment resolution of about 250 kHz. Therange of current should be such that the ring oscillator frequency canbe adjusted over a 4 to 1 range. TABLE 392 Programmable current sourcetarget performance Parameter Conditions Min Typ Max Units Iout Trim7 − 2= 0  5 μA Trim7 − 2 = 32 12.5 Trim7 − 2 = 63 20 Vrefin 1.23 V Rout Trim7− 2 = 63 2.5 MΩ

[8641] 20.4.2 Ring Oscillator Circuit TABLE 393 Ring oscillator targetperformance Parameter Conditions Min Typ Max Units Fosc¹⁸ 7 10 14 MHzIDD 10 μA KI 1 MHz/μA KVDD +200 KHz/V KT +30 KHz/o C. Vstart 1.5 V

[8642] 20.4.3 Div5

[8643] The ring oscillator will be prescaled by 5 to obtain the nominal10 MHz clock. An asynchronous design may be used to save power. Severaldivided clock duty cycles are obtainable, eg 4:1, 3:2 etc. To easetiming requirements for the standard cell logic block, the followingclock will be generated; most flip-flops will operate on the rising edgeof the clock allowing negative edge clocking to meet memory timing.TABLE 394 Div5 target performance Parameter Conditions Min Typ Max UnitsFmax Vdd = 1.5 V 100 MHz IDD 10 μA

[8644] 20.5 Power on Reset

[8645] This block combines the overL (omitted from the current version),underL and MAURstOutL signals to provide the global reset. MAURstOutL isdelayed by one clock cycle to ensure a reset generated when this signalis asserted has at least this duration since the reset deasserts thesignal itself. It should be noted that the register, with active lowreset RN, is the only one in the QA chip not connected to RstL.

[8646] [4] TSMC, Oct. 1, 2000, SFC0008_(—)08B9_HE, 8K×8 Embedded FlashMemory Specification, Rev 0.1.

[8647] [5] TSMC (design service division), Sep. 10, 2001, 0.25 umEmbedded Flash Test Mode User Guide, V0.3.

[8648] [6] TSMC (EmbFlash product marketing), Oct. 19, 2001, 0.25 umApplication Note, V2.2.

[8649] [7] Artisan Components, January 1999, Process Perfect LibraryDatabook 2.5-Volt Standard Cells, Rev1.0.

[8650] Other Applcations for Protocols and QA Chips

[8651] 1 Introduction

[8652] In its preferred form, the QA chip [1] is a programmable 32 bitmicroprocessor with security features (8,000 gates, 3 k bits of RAM and8 kbytes of flash memory for program and non-volatile data storage). Itis manufactured in a 0.25 um CMOS process.

[8653] Physically, the chip is mounted in a 5 pin SOT23 plastic packageand communicates with external circuitry via a two pin serial bus.

[8654] The QA chip was designed to for authenticating consumable usageand performance upgrades in printers and associated hardware.

[8655] Because of its core functionality and programmability the QA chipcan also be used in applications that differ significantly from itsoriginal one. This document seeks to identify some of those areas.

[8656] 3 Applications Overview

[8657] Applications include:

[8658] Regular EEPROM

[8659] Secure EEPROM

[8660] General purpose MPU with security features

[8661] Security coprocessor for microprocessor system

[8662] Security coprocessor for PC (with optional USB connection)

[8663] Resource dispenser—secure, web based transfer of a variablequantity from “source” to “sink”

[8664] IDtag

[8665] Security pass inside offices

[8666] Set top box security

[8667] Car key

[8668] Car Petrol

[8669] Car manufacturer “genuine parts” detection, where the carrequires genuine (or authorised) parts to function.

[8670] Aeroplane control on motor-control servos to allow secureexternal control on an aircraft in a hijack situation.

[8671] Security device for controlling access to and copying of audio,video, and data (eg, preventing unauthorized downloading of music to adevice).

[8672] 4 Exemplary Application Descriptions

[8673] 4.1 Car Petrol

[8674] Using mechanisms and protocols similar to those described inrelation to ink refills, refilling of petrol can be controlled. Anexample of a commercial relationship this allows is selling a car at adiscounted rate, but requiring that the car be refilled at designatedservice stations. Similarly, prevention of unauthorized servicing can beachieved.

[8675] 4.2 Car Keys

[8676] 4.2.1 Basic Advantages Over Physical Keys

[8677] Keys and locks can be easily programmed & configured for use

[8678] Can only be duplicated/reprogrammed by an authorised individual

[8679] The same key can be used for physical entry/exit and remote(radio-based) entry/exit

[8680] Inbuilt security features

[8681] 4.2.2 Single Key for Multiple Vehicles

[8682] Useful when a family has more than one car.

[8683] Can be programmed so any keys fits any car.

[8684] Fewer number of duplicate keys.

[8685] Misplacing a key for a particular car—any key for any other carcan be used as oppose to duplicate of the same key.

[8686] 4.2.3 Multiple Keys for a Single Vehicle

[8687] 4.2.3.1 Same Company Car Being Driven by Multiple Drivers

[8688] Mileage can be logged per driver e.g. for accounting purposes.

[8689] Key permissions can be different per driver (e.g. boot/trunkaccess may be disabled)

[8690] 4.2.3.2 Same Family Car Being Driven by Children and Parents

[8691] Time/date restrictions can be applied to (e.g. children's) keys

[8692] Speeds above a specified limit (and duration of that speed) canbe logged for auditing purposes (may be less dangerous than actuallyenforcing a speed limit)

[8693] 4.2.4 No Problem if Key Lost

[8694] Can easily:

[8695] make a new key the same as lost one (existing copies of key willstill function)

[8696] reprogram the locks on car (and reprogram all non-lost keys tomatch) so the lost key will no longer function

[8697] 4.2.5 No Problem if Key Left in Car

[8698] Easy to create a one-time-use open-door-only key via roadsideassistance based on secret password information, driver's license etc(prevents having to break into the car)

[8699] 4.2.6 Car Rentals

[8700] Key can have an expiration date (e.g. some period past the rentalend-date)

[8701] 4.2.7 Single Physical Key for All Locks in Car

[8702] A single physical key can open all locks (door, immobiliser,boot/trunk, glovebox etc.).

[8703] The following documents are incorporated by cross-reference[

[8704] [1] IBM Cu-11 Databook: Macros, Mar. 21, 2002.

[8705] [2] Universal Serial Bus (USB) Specification Rev 1.1, CompaqComputer Corporation, Intel Corporation, Microsoft Corporation, NECCorporation, Sep. 28, 1998.

[8706] [3] Synopsys DesignWare USB 1.1 OHCI Host Controller withAHB/PVCI Databook Version 2.6, February 2003.

[8707] [4] Open Host Controller Interface Specification (OpenHCI) forUSB Rev1.0a, Compaq, Microsoft, National Semiconductor, Sep. 14, 1999.

[8708] [5] inSilicon TymeWare USB 1.1 Device Controller (UDCVCI) CoreUser Manual Version 1.1, inSilicon Corporation, November 2000.

[8709] [6] Amphion, 2001, CS6150 Motion JPEG Decoder Databook, AmphionSemiconductor Ltd.

[8710] [7] ANSI/EIA 538-1988, Facsimile Coding Schemes and CodingControl Functions for Group 4 Facsismile Equipment, August 1988

[8711] [8] Bender, W., P. Chesnais, S. Elo, A. Shaw, and M. Shaw,Enriching communities: Harbingers of news in the future, IBM SystemsJournal, Vol.35, Nos.3&4, 1996, pp.369-380

[8712] [9] CCIR Rec. 601-2, Encoding Parameters of Digital Televisionfor Studios, Recommendations of the CCIR, 1990, Vol XI-Part 1,Broadcasting Service (Television), pp.95-104.

[8713] [10] Farrell, J., How to Allocate Bits to Optimize PhotographicImage Quality, Proceedings of IS&T International Conference on DigitalPrinting Technologies, 1998, pp.572-576

[8714] [11] Humphreys, G. W., and V. Bruce, Visual Cognition, LawrenceErlbaum Associates, 1989, p. 15

[8715] [12] ISO/IEC 19018-1:1994, Information technology—Digitalcompression and coding of continuous-tone still images: Requirements andguidelines, 1994

[8716] [13]Lyppens, H., Reed-Solomon Error Correction, Dr. Dobb'sJournal Vol.22, No.1, January 1997

[8717] [14] Olsen, J. Smoothing Enlarged Monochrome Images, in Glassner,A. S. (ed.), Graphics Gems, AP Professional, 1990

[8718] [15] Rorabaugh, C, Error Coding Cookbook, McGraw-Hill 1996

[8719] [16] Thompson, H. S., Multilingual Corpus 1 CD-ROM, EuropeanCorpus Initiative

[8720] [17] Urban, S. J., Review of standards for electronic imaging forfacsimile systems, Journal of Electronic Imaging, Vol.1(1), January1992, pp.5-21

[8721] [18] Wallace, G. K., The JPEG Still Picture Compression Standard,Communications of the ACM, Vol.34, No.4, April 1991, pp.30-44

[8722] [19] Wicker, S., and Bhargava, V., Reed-Solomon Codes and theirApplications, IEEE Press 1994

[8723] [20] Yasuda, Y., Overview of Digital Facsimile Coding Techniquesin Japan, Proceedings of the IEEE, Vol. 68(7), July 1980, pp.830-845

[8724] [21] SPARC International, SPARC Architecture Manual, Version8,Revision SAV080SI9308

[8725] [22] Gaisler Research, The LEON-2 Processor User's Manual,Version 1.0.7, September 2002

[8726] [23] ARM Limited, AMBA Specification, Rev2.0, May 1999

[8727] [24] ITU-T, Reccomendation T30, Procedures for document facsimiletransmission in the general switched telephone network, 07/2003.

[8728] [25] Anderson, R, and Kuhn, M., 1997, Low Cost Attacks on TamperResistant Devices, Security Protocols, Proceedings 1997, LNCS 1361, B.Christianson, B. Crispo, M. Lomas, M. Roe, Eds., Springer-Verlag,pp.125-136.

[8729] [26] Anderson, R., and Needham, R. M., Programming Satan'sComputer, Computer Science Today, LNCS 1000, pp. 426-441.

[8730] [27] Atkins, D., Graff, M., Lenstra, A. K., and Leyland, P. C.,1995, The Magic Words Are Squeamish Ossifrage, Advances inCryptology—ASIACRYPT '94 Proceedings, Springer-Verlag, pp. 263-277.

[8731] [28]Bains, S., 1997, Optical schemes tried out in IC test—IBM andLucent teams take passive and active paths, respectively, to imaging.EETimes, Dec. 22, 1997.

[8732] [29] Bao, F., Deng, R. H., Yan, Y, Jeng, A., Narasimhalu, A. D.,Ngair, T., 1997, Breaking Public Key Cryptosystems on Tamper ResistantDevices in the Presence of Transient Faults, Security Protocols,Proceedings 1997, LNCS 1361, B. Christianson, B. Crispo, M. Lomas, M.Roe, Eds., Springer-Verlag, pp. 115-124.

[8733] [30] Bellare, M., Canetti, R., and Krawczyk. H., 1996, KeyingHash Functions For Message Authentication, Advances in Cryptology,Proceedings Crypto'96, LNCS 1109, N. Koblitz, Ed., Springer-Verlag,1996, pp. 1-15. Full version:http://www.research.ibm.com/security/keyed-md5.html

[8734] [31] Bellare, M., Canetti, R., and Krawczyk, H., 1996, The HMACConstruction, RSA Laboratories CryptoBytes, Vol. 2, No 1, 1996, pp.12-15.

[8735] [32] Bellare, M., Guérin, R., and Rogaway, P., 1995, XOR MA Cs:New Methods For Message Authentication Using Finite PseudorandomFunctions, Advances in Cryptology, Proceedings Crypto'95, LNCS 963, DCoppersmith, Ed., Springer-Verlag, 1995, pp. 15-28.

[8736] [33] Blaze, M., Diffie, W., Rivest, R., Schneier, B., Shimomura,T., Thompson, E., Wiener, M., 1996, Minimal Key Lengths For SymmetricCiphers To Provide Adequate Commercial Security, A Report By an Ad HocGroup of Cryptographers and Computer Scientists, Published on theinternet: http://www.livelinks.com/livelinks/bsa/cryptographers.html

[8737] [34] Blum, L., Blum, M., and Shub, M., A Simple UnpredictablePseudo-random Number Generator, SIAM Journal of Computing, vol 15, no 2,May 1986, pp 364-383.

[8738] [35] Bosselaers, A., and Preneel, B., editors, 1995, IntegrityPrimitives for Secure Information Systems: Final Report ofRACE IntegrityPrimitives Evaluation RIPE-RACE 1040, LNCS 1007, Springer-Verlag, NewYork.

[8739] [36] Brassard, G., 1988, Modern Cryptography, a Tutorial, LNCS325, Springer-Verlag.

[8740] [37] Canetti, R., 1997, Towards Realizing Random Oracles: HashFunctions That Hide All Partial Information, Advances in Cryptology,Proceedings Crypto'97, LNCS 1294, B. Kaliski, Ed., Springer-Verlag, pp.455-469.

[8741] [38] Cheng, P., and Glenn, R., 1997, Test Cases for HMAC-MD5 andHMAC-SHA-1, Network Working Group RFC 2202,http://reference.ncrs.usda.gov/ietf/rfc/2300/rfc2202.htm

[8742] [39] Diffie, W., and Hellman, M. E., 1976, MultiuserCryptographic Techniques, AFIPS national Computer Conference,Proceedings '76, pp. 109-112.

[8743] [40] Diffie, W., and Hellman, M. E., 1976, New Directions inCryptography, IEEE Transactions on Information Theory, Volume IT-22, No6 (November 1976), pp. 644-654.

[8744] [41] Diffie, W., and Hellman, M. E., 1977, ExhaustiveCryptanalysis of the NBS Data Encryption Standard, Computer, Volume 10,No 6, (June 1977), pp. 74-84.

[8745] [42]Dobbertin, H., 1995, Alf Swindles Ann, RSA LaboratoriesCryptoBytes, Volume 1, No 3, p. 5.

[8746] [43] Dobbertin, H, 1996, Cryptanalysis of MD4, Fast SoftwareEncryption—Cambridge Workshop, LNCS 1039, Springer-Verlag, 1996, pp53-69.

[8747] [44] Dobbertin, H, 1996, The Status of MD5 After a Recent Attack,RSA Laboratories CryptoBytes, Volume 2, No 2, pp. 1, 3-6.

[8748] [45] Dreifus, H., and Monk, J. T., 1988, Smart Cards—A Guide toBuilding and Managing Smart Card Applications, John Wiley and Sons.

[8749] [46] EIGamal, T., 1985, A Public-Key Cryptosystem and a SignatureScheme Based on Discrete Logarithms, Advances in Cryptography,Proceedings Crypto'84, LNCS 196, Springer-Verlag, pp. 10-18.

[8750] [47] EIGamal, T., 1985, A Public-Key Cryptosystem and a SignatureScheme Based on Discrete Logarithms, IEEE Transactions on InformationTheory, Volume 31, No 4, pp. 469-472

[8751] [48] Feige, U., Fiat, A, and Sharnir, A., 1988, Zero KnowledgeProofs of Identity, J Cryptography, Volume 1, pp. 77-904.

[8752] [49] Feigenbaum, J., 1992, Overview of Interactive Proof Systemsand Zero-Knowledge, Contemporary Cryptology—The Science of InformationIntegrity, G Simmons, Ed., IEEE Press, New York.

[8753] [50] FIPS 46-1, 1977, Data Encryption Standard, NIST, USDepartment of Commerce, Washington D.C., January 1977.

[8754] [51] FIPS 180, 1993, Secure Hash Standard, NIST, US Department ofCommerce, Washington D.C., May 1993.

[8755] [52] FIPS 180-1, 1995, Secure Hash Standard, NIST, US Departmentof Commerce, Washington D.C., April 1995.

[8756] [53] FIPS 186, 1994, Digital Signature Standard, NIST, USDepartment of Commerce, Washington D.C., 1994.

[8757] [54] Gardner, M., 1977, A New Kind of Cipher That Would TakeMillions of Years to Break, Scientific American, Vol. 237, No. 8, pp.120-124.

[8758] [55]Girard, P., Roche, F. M., Pistoulet, B., 1986, Electron BeamEffects on VLSI MOS Conditions for Testing and Reconfiguration,Wafer-Scale Integration, G. Saucier and J. Trihle, Eds., Amsterdam

[8759] [56] Girard, P., Pistoulet, B., Valenza, M., and Lorival, R.,1987, Electron Beam Switching of Floating Gate MOS Transistors, IFIPInternational Workshop on Wafer Scale International, Brunel University,Sep. 23-25, 1987.

[8760] [57] Goldberg, I., and Wagner, D., 1996, Randomness and theNetscape Browser, Dr. Dobb's Journal, January 1996.

[8761] [58] Guilou, L. G., Ugon, M., and Quisquater, J., 1992, The SmartCard, Contemporary Cryptology—The Science of Information Integrity, GSimmons, Ed., IEEE Press, New York.

[8762] [59] Gutman, P., 1996, Secure Deletion ofData From Magnetic andSolid-State Memory, Sixth USENIX Security Symposium Proceedings (July1996), pp. 77-89.

[8763] [60] Hendry, M., 1997, Smart Card Security and Applications,Artech House, Norwood Mass.

[8764] [61]Holgate, S. A., 1998, Sensing is Believing, New Scientist, 15Aug. 1998, p 20.

[8765] [62] Johansson, T., 1997, Bucket Hashing with a Small Key Size,Advances in Cryptology, Proceedings Eurocrypt'97, LNCS 1233, W. Fumy,Ed., Springer-Verlag, pp. 149-162.

[8766] [63] Kahn, D., 1967, The Codebreakers: The Story of SecretWriting, New York: Macmillan Publishing Co.

[8767] [64] Kaliski, B., 1991, Letter to NIST regarding DSS, 4 Nov.1991.

[8768] [65] Kaliski, B., 1998, New Threat Discovered and Fixed, RSALaboratories Web site http://www.rsa.com/rsalabs/pkcs1

[8769] [66] Kaliski, B., and Robshaw, M, 1995, Message AuthenticationWith MD5, RSA Laboratories Crypto-Bytes, Volume 1, No 1, pp. 5-8.

[8770] [67] Kaliski, B., and Yin, Y. L., 1995, On Differential andLinear Cryptanalysis of the RC5 Encryption Algorithm, Advances inCryptology, Proceedings Crypto '95, LNCS 963, D. Coppersrnith, Ed.,Springer-Verlag, pp. 171-184.

[8771] [68] Klapper, A., and Goresky, M., 1994, 2-Adic Shift Registers,Fast Software Encryption: Proceedings Cambridge Security Workshop '93,LNCS 809, R. Anderson, Ed., Springer-Verlag, pp. 174-178.

[8772] [69] Klapper, A., 1996, On the Existence of Secure FeedbackRegisters, Advances in Cryptology, Proceedings Eurocrypt'96, LNCS 1070,U. Maurer, Ed., Springer-Verlag, pp. 256-267.

[8773] [70] Kleiner, K., 1998, Cashing in on the not so smart cards, NewScientist, 20 Jun. 1998, p 12.

[8774] [71] Knudsen, L. R., and Lai, X., Improved Differential Attackson RC5, Advances in Cryptology, Proceedings Crypto'96, LNCS 1109, N.Koblitz, Ed., Springer-Verlag, 1996, pp.216-228

[8775] [72] Knuth, D. E., 1998, The Art of Computer Programing—Volume2/Seminumerical Algorithms, 3rd edition, Addison-Wesley.

[8776] [73] Krawczyk, H., 1995, New Hash Functions for MessageAuthentication, Advances in Cryptology, Proceedings Eurocrypt'95, LNCS921, L Guillou, J Quisquater, (editors), Springer-Verlag, pp. 301-310.

[8777] [74] Krawczyk, H., 199x, Network Encryption—History and Patents,internet publication:

[8778] http://www.cygnus.com/˜gnu/netcrypt.html

[8779] [75]Krawczyk, H., Bellare, M, Canetti, R., 1997, HMAC: KeyedHashing for message Authentication, Network Working Group RFC 2104,http:reference.ncrs.usda.gov/ietf/rfc/2200/rfc2104.htm

[8780] [76] Lai, X., 1992, On the Design and Security of Block Ciphers,ETH Series in Information Processing, J. L. Massey (editor), Volume 1,Konstanz: hartung-Gorre Verlag (Zurich).

[8781] [77] Lai, X, and Massey, 1991, J. L, A Proposal for a New BlockEncryption Standard, Advances in Cryptology, Proceedings Eurocrypt'90,LNCS 473, Springer-Verlag, pp. 389-404.

[8782] [78] Massey, J. L., 1969, Shift Register Sequences and BCHDecoding, IEEE Transactions on Information Theory, IT-15, pp. 122-127.

[8783] [79] Mende, B., Noll, L., and Sisodiya, S., 1997, How LavarandWorks, Silicon Graphics Incorporated, published on Internet:http://lavarand.sgi.com (also reported in Scientific American, November1997 p. 18, and New Scientist, 8 Nov. 1997).

[8784] [80] Menezes, A. J., van Oorschot, P. C., Vanstone, S. A., 1997,Handbook of Applied Cryptography, CRC Press.

[8785] [81]Merkle, R. C., 1978, Secure Communication Over InsecureChannels, Communications of the ACM, Volume 21, No 4, pp. 294-299.

[8786] [82] Montgomery, P. L., 1985, Modular Multiplication WithoutTrial Division, Mathematics of Computation, Volume 44, Number 170, pp.519-521.

[8787] [83] Moreau, T., A Practical “Perfect” Pseudo-Random NumberGenerator, paper submitted to Computers in Physics on Feb. 27, 1996,Internet version: http://www.connotech.com/BBS.HTM

[8788] [84] Moreau, T., 1997, Pseudo-Random Generators, a High-LevelSurvey-in-Progress, Published on the internet:http://www.cabano.com/connotech/RNG.HTM

[8789] [85] NIST, 1994, Digital Signature Standard, NIST ISL Bulletin,online version at http://csrc.ncsl.nist.gov/nistbul/cs194-11.txt

[8790] [86] Oehler, M., Glenn, R., 1997, HMAC-MD5 IP Authentication withReplay Prevention, Network Working Group RFC 2085,http://reference.ncrs.usda.gov/ietf/rfc/2100/rfc2085.txt

[8791] [87] Oppliger, R., 1996, Authentication Systems For SecureNetworks, Artech House, Norwood Mass.

[8792] [88]Preneel, B., van Oorschot, P. C., 1996, MDx-MAC And BuildingFast MACs From Hash Functions, Advances in Cryptology, ProceedingsCrypto'95, LNCS 963, D. Coppersmith, Ed., Springer-Verlag, pp. 1-14.

[8793] [89] Preneel, B., van Oorschot, P. C., 1996, On the Security ofTwo MAC Algorithms, Advances in Cryptology, Proceedings Eurocrypt'96,LNCS 1070, U. Maurer, Ed., Springer-Verlag, 1996, pp. 19-32.

[8794] [90] Preneel, B., Bosselaers, A., Dobbertin, H., 1997, TheCryptographic Hash Function RIPEMD-160, CryptoBytes, Volume 3, No 2,1997, pp. 9-14.

[8795] [91] Rankl, W., and Effing, W., 1997, Smart Card Handbook, JohnWiley and Sons (first published as Handbuch der Chipkarten, Carl HanserVerlag, Munich, 1995).

[8796] [92] Ritter, T., 1991, The Efficient Generation of CryptographicConfusion Sequences, Cryptologia, Volume 15, No 2, pp. 81-139.

[8797] [93] Rivest, R. L, 1993, Dr. Ron Rivest on the Difficulties ofFactoring, Ciphertext: The RSA Newsletter, Vol 1, No 1, pp. 6, 8.

[8798] [94] Rivest, R. L., 1991, The MD4 Message-Digest Algorithm,Advances in Cryptology, Proceedings Crypto'90, LNCS 537, S. Vanstone,Ed., Springer-Verlag, pp. 301-311.

[8799] [95] Rivest, R. L., 1992, The RC4 Encryption Algorithm, RSA DataSecurity Inc. (This document has not been made public).

[8800] [96] Rivest, R. L., 1992, The MD4 Message-Digest Algorithm,Request for Comments (RFC) 1320, Internet Activities Board, InternetPrivacy Task Force, April 1992.

[8801] [97] Rivest, R. L., 1992, The MD5 Message-Digest Algorithm,Request for Comments (RFC) 1321, Internet Activities Board, Internetprivacy Task Force.

[8802] [98] Rivest, R. L., 1995, The RC5 Encryption Algorithm, FastSoftware Encryption, LNCS 1008, Springer-Verlag, pp. 86-96.

[8803] [99] Rivest, R. L., Shamir, A., and Adleman, L. M., 1978, AMethod For Obtaining Digital Signatures and Public-Key Cryptosystems,Communications of the ACM, Volume 21, No 2, pp. 120-126.

[8804] [100] Schneier, S., 1994, Description of a New Variable-LengthKey, 64-Bit Block Cipher (Blowfish), Fast Software Encryption (December1993), LNCS 809, Springer-Verlag, pp. 191-204.

[8805] [101] Schneier, S., 1995, The Blowfish Encryption Algorithm—OneYear Later, Dr Dobb's Journal, September 1995.

[8806] [102] Schneier, S., 1996, Applied Cryptography, Wiley Press.

[8807] [103] Schneier, S., 1998, The Blowfish Encryption Algorithm,revision date Feb. 25, 1998, http://www.counterpane.com/blowfish.html

[8808] [104] Schneier, S., 1998, The Crypto Bomb is Ticking, ByteMagazine, May 1998, pp. 97-102.

[8809] [105] Schnorr, C. P., 1990, Efficient Identification andSignatures for Smart Cards, Advances in Cryptology, ProceedingsEurocrypt'89, LNCS 435, Springer-Verlag, pp. 239-252.

[8810] [106] Shamir, A., and Fiat, A., Method, Apparatus and Article ForIdentification and Signature, U.S. Pat. No. 4,748,668, 31 May 1988.

[8811] [107] Shor, W., 1994, Algorithms for Quantum Computation:Discrete Logarithms and Factoring, Proc. 35th Symposium. Foundations ofComputer Science (FOCS), IEEE Computer Society, Los Alamitos, Calif.,1994.

[8812] [108] Simmons, G. J., 1992, A Survey of InformationAuthentication, Contemporary Cryptology—The Science of InformationIntegrity, G Simmons, Ed., IEEE Press, New York.

[8813] [109] Tewksbury, S. K., 1998, Architectural Fault Tolerance,Integrated Circuit Manufacturability, Pineda de Gyvez, J., and Pradhan,D. K., Eds., IEEE Press, New York.

[8814] [110] TSMC, 2000, SFC0008_(—)08B9_HE, 8K×8 Embedded Flash MemorySpecification, Rev 0.1.

[8815] [111] Tsudik, G., 1992, Message Authentication With One-way HashFunctions, Proceedings of Infocom '92 (Also in Access Control and PolicyEnforcement in Internetworks, Ph.D. Dissertation, Computer ScienceDepartment, University of Southern California, April 1991).

[8816] [112] Vallett. D., Kash, J., and Tsang, J., Watching Chips Work,IBM MicroNews, Vol 4, No 1, 1998.

[8817] [113] Vazirani, U. V., and Vazirani, V. V., 1984, Efficient andSecure Random Number Generation, 25th Symposium. Foundations of ComputerScience (FOCS), IEEE Computer Society, 1984, pp. 458-463.

[8818] [114] Wagner, D., Goldberg, I., and Briceno, M., 1998, GSMCloning, ISAAC Research Group, University of California,http://www.isaac.cs.berkeley.edu/isaac/gsm-faq.html

[8819] [115] Wiener, M. J., 1997, Efficient DES Key Search—An Update,RSA Laboratories CryptoBytes, Volume 3, No 2, pp. 6-8.

[8820] [116] Zoreda, J. L., and Otón, J. M., 1994, Smart Cards, ArtechHouse, Norwood Mass.

[8821] [117] Authentication Protocols v0_(—)2 29 November 2002.Silverbrook Research.

[8822] [118] H. Krawczyk IBM, M. Bellare UCSD, R. Canetti IBM, RFC 2104,February 1997, http://www.ietf.org/rfc/rfc2104.txt

[8823] [119] 4-3-1-2-QAChipSpec v4_(—)09 17 April 2003, SilverbrookResearch

[8824] [120] 4-4-9-4 SoPEC_hardware_design_v3_(—)1 28 Jan. 2003

[8825] [121] Silverbrook Research, 2002, 4-3-1-8 QID RequirementsSpecification.

[8826] [122] TSMC, Oct. 1, 2000, SFC008_(—)08B9_HE, 8K×8 Embedded FlashMemory Specification, Rev 0.1.

[8827] [123] TSMC (design service division), Sep. 10, 2001, 0.25 umEmbedded Flash Test Mode User Guide, V0.3.

[8828] [124] TSMC (EmbFlash product marketing), Oct. 19, 2001, 0.25 umApplication Note, V2.2.

[8829] [125] Artisan Components, Jan 99, Process Perfect LibraryDatabook 2.5-Volt Standard Cells, Rev1.0.

[8830] [126] “4-3-1-2 QA Chip Technical Reference v 4.06”, SilverbrookResearch.

[8831] [127] National Institute of Standards and Technology—FederalInformation Processing Standardshttp://csrc.nist.gov/publications/fips/

[8832] [128] U.S. patent application Ser. No. 09/575,108, filed May 23,2000, Silverbrook Research Pty Ltd

[8833] [129] U.S. patent application Ser. No. 09/575,109, filed May 23,2000, Silverbrook Research Pty Ltd

[8834] [130] U.S. patent application Ser. No. 09/575,100, filed May 23,2000, Silverbrook Research Pty Ltd

[8835] [131] U.S. patent application Ser. No. 09/607,985, filed May 23,2000, Silverbrook Research Pty Ltd

[8836] [132] U.S. Pat. No. 6,398,332, filed Jun. 30, 2000, SilverbrookResearch Pty

[8837] [133] U.S. Pat. No. 6,394,573, filed Jun. 30, 2000, SilverbrookResearch Pty Ltd

[8838] [134] U.S. Pat. No. 6,622,923, filed Jun. 30, 2000, SilverbrookResearch Pty Ltd

[8839] [135] U.S. patent application Ser. No. 09/517,539, filed Mar. 2,2000, Silverbrook Research Pty Ltd

[8840] [136] U.S. Pat. No. 6,566,858, filed Jul. 10, 1998, SilverbrookResearch Pty Ltd

[8841] [137] U.S. Pat. No. 6,331,946, filed Jul. 10, 1998, SilverbrookResearch Pty Ltd

[8842] [138] U.S. Pat. No. 6,246,970, filed Jul. 10, 1998, SilverbrookResearch Pty Ltd

[8843] [139] U.S. Pat. No. 6,442,525, filed Jul. 10, 1998, SilverbrookResearch Pty Ltd

[8844] [140] U.S. patent application Ser. No. 09/517,384, filed Mar. 2,2000, Silverbrook Research Pty Ltd

[8845] [141] U.S. patent application Ser. No. 09/505,951, filed Feb. 21,2001, Silverbrook Research Pty Ltd

[8846] [142] U.S. Pat. No. 6,374,354, filed Mar. 2, 2000, SilverbrookResearch Pty Ltd

[8847] [143] U.S. patent application Ser. No. 09/517,608, filed Mar. 2,2000, Silverbrook Research Pty Ltd

[8848] [144] U.S. Pat. No. 6,334,190, filed Mar. 2, 2000, SilverbrookResearch Pty Ltd

[8849] [145] U.S. patent application Ser. No. 09/517,541, filed Mar. 2,2000, Silverbrook Research Pty Ltd

1. A printer controller configured to generate dot data for supply to aprinthead that includes at least first and second longitudinallyextending printhead chips that are positioned adjacent each other eitherside of a join region such that a printing width of the printhead iswider than the length of either printhead chip, the printer controllerbeing configured such that, in the event that the printhead chips towhich dot data is being supplied are of sufficiently unequal relativelength, the dot data is supplied more frequently, or at a higher rate,to the longer of the two printhead chips.
 2. A printer controlleraccording to claim 1, configured to supply the dot data to the printheadmodules such that none of the printhead modules is full and ready forprinting substantially earlier than any of the other printhead modules.3. A printer controller according to claim 1, wherein the dot data issupplied to the printhead from a memory under the control of theprinthead controller.
 4. A printer controller according to claim 1,including a hardware module for undertaking the task of bandwidthmanagement.
 5. A printer controller according to claim 4, wherein thehardware module is also configured to compensate for different lengthprintheads.
 6. A printer controller according to claim 1, configured tomanipulate the supply of dot data to each of the printhead modules suchthat memory bandwidth usage is substantially constant during a printheadloading cycle.